GUI Agents Papers
Star · 751

Improving Language Understanding from Screenshots

Tianyu Gao, Zirui Wang, Adithya Bhaskar, Danqi Chen

🏛 Institutions
Princeton Language and Intelligence (PLI), Princeton University
📅 Date
February 21, 2024
📑 Publisher
arXiv
💻 Env
🔑 Keywords
TLDR

This paper studies screenshot language models in a simplified plain-text-rendered setting and improves them with a patch-and-text prediction objective. It is relevant here because screenshot pretraining can transfer to UI-style inputs, but the paper is about general screenshot language understanding rather than direct GUI-agent behavior.

Open paper arXiv Edit on GitHub Report issue