Improving Language Understanding from Screenshots

Tianyu Gao , Zirui Wang , Adithya Bhaskar , Danqi Chen

🏛 Institutions: Princeton Language and Intelligence (PLI) , Princeton University
📅 Date: February 21, 2024
📑 Publisher: arXiv
💻 Env
🔑 Keywords: screenshot language models PTP patch-and-text prediction language understanding plain-text-rendered screenshots

TLDR

This paper studies screenshot language models in a simplified plain-text-rendered setting and improves them with a patch-and-text prediction objective. It is relevant here because screenshot pretraining can transfer to UI-style inputs, but the paper is about general screenshot language understanding rather than direct GUI-agent behavior.

Open paper arXiv Report issue