GUI Agents Papers
Star · 751

Visual Grounding for User Interfaces

Yijun Qian, Yujie Lu, Alexander Hauptmann, Oriana Riva

🏛 Institutions
CMU, UC Santa Barbara, Google Research
📅 Date
June 16, 2024
📑 Publisher
NAACL 2024 Industry Track
💻 Env
General GUI
🔑 Keywords
TLDR

This paper defines visual UI grounding, where a model must localize the UI element referenced by a natural-language command directly from a screenshot without relying on UI metadata. It proposes LVG, which combines layout-guided contrastive learning with synthetic-to-real multi-context learning and improves top-1 accuracy by more than 4.9 points over strong baselines.

Open paper Edit on GitHub Report issue
Related papers