GUI Agents Papers
Star · 821

Visual Grounding for User Interfaces

Yijun Qian , Yujie Lu , Alexander Hauptmann , Oriana Riva

🏛 Institutions
CMU , UC Santa Barbara , Google Research
📅 Date
June 16, 2024
📑 Publisher
NAACL 2024 Industry Track
💻 Env
General GUI
🔑 Keywords
TLDR

This paper defines visual UI grounding, where a model must localize the UI element referenced by a natural-language command directly from a screenshot without relying on UI metadata. It proposes LVG, which combines layout-guided contrastive learning with synthetic-to-real multi-context learning and improves top-1 accuracy by more than 4.9 points over strong baselines.

Open paper Report issue
Related papers (24)