GUI Agents Papers
Star · 751

Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents

Boyu Gou, Ruohan Wang, Boyuan Zheng, Yanan Xie, Cheng Chang, Yiheng Shu, Huan Sun, Yu Su

🏛 Institutions
OSU, Orby AI
📅 Date
October 7, 2024
📑 Publisher
ICLR 2025 (Oral)
💻 Env
Desktop Mobile Web
🔑 Keywords
TLDR

This paper introduces UGround, a universal GUI visual grounding model trained on 10M element-expression pairs over 1.3M screenshots from web, mobile, and desktop interfaces. It argues for vision-only GUI agents with pixel-level actions and shows that UGround improves grounding, offline-agent, and online-agent performance across six benchmarks.

Open paper Edit on GitHub Report issue
Related papers