Improving Language Understanding from Screenshots
Tianyu Gao, Zirui Wang, Adithya Bhaskar, Danqi Chen
- 🏛 Institutions
- Princeton Language and Intelligence (PLI), Princeton University
- 📅 Date
- February 21, 2024
- 📑 Publisher
- arXiv
- 💻 Env
- 🔑 Keywords
TLDR
This paper studies screenshot language models in a simplified plain-text-rendered setting and improves them with a patch-and-text prediction objective. It is relevant here because screenshot pretraining can transfer to UI-style inputs, but the paper is about general screenshot language understanding rather than direct GUI-agent behavior.