GUI Agents Papers
Star · 751

Think Twice, Click Once: Enhancing GUI Grounding via Fast and Slow Systems

Fei Tang, Yongliang Shen, Hang Zhang, Siqi Chen, Guiyang Hou, Wenqi Zhang, Wenqiao Zhang, Kaitao Song, Weiming Lu, Yueting Zhuang

🏛 Institutions
ZJU, MSR Asia
📅 Date
March 9, 2025
📑 Publisher
arXiv
💻 Env
General GUI
🔑 Keywords
TLDR

Focus is a GUI grounding model that switches between fast prediction and slower analysis depending on task complexity. It decomposes grounding into summarization, focused visual analysis, and coordinate prediction, and reaches strong ScreenSpot and ScreenSpot-Pro performance with a 2B model trained on 300K examples.

Open paper arXiv Edit on GitHub Report issue
Related papers