GUI Agents Papers
Star · 821

SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents

Kanzhi Cheng , Qiushi Sun , Yougang Chu , Fangzhi Xu , Yantao Li , Jianbing Zhang , Zhiyong Wu

🏛 Institutions
National Key Laboratory for Novel Software Technology , NJU , Shanghai AI Laboratory
📅 Date
January 17, 2024
📑 Publisher
ACL 2024
💻 Env
Desktop Mobile Web
🔑 Keywords
TLDR

SeeClick is a screenshot-only GUI agent built around the GUI grounding problem rather than structured trees such as HTML. The paper adds automated GUI-grounding data curation and introduces ScreenSpot, a grounding benchmark spanning mobile, desktop, and web environments.

Open paper arXiv Report issue
Related papers (24)