GUI Agents Papers
Star · 751

SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents

Kanzhi Cheng, Qiushi Sun, Yougang Chu, Fangzhi Xu, Yantao Li, Jianbing Zhang, Zhiyong Wu

🏛 Institutions
National Key Laboratory for Novel Software Technology, NJU, Shanghai AI Laboratory
📅 Date
January 17, 2024
📑 Publisher
ACL 2024
💻 Env
Desktop Mobile Web
🔑 Keywords
TLDR

SeeClick is a screenshot-only GUI agent built around the GUI grounding problem rather than structured trees such as HTML. The paper adds automated GUI-grounding data curation and introduces ScreenSpot, a grounding benchmark spanning mobile, desktop, and web environments.

Open paper Edit on GitHub Report issue
Related papers