GUI Agents Papers
Star · 751

GUI-Eyes: Tool-Augmented Perception for Visual Grounding in GUI Agents

Chen Chen, Jiawei Shao, Dakuan Lu, Haoyi Hu, Xiangcheng Liu, Hantao Yao, Wu Liu

🏛 Institutions
USTC, Institute of Artificial Intelligence (TeleAI), China Telecom, Shanghai Innovation Institute, SJTU
📅 Date
January 14, 2026
📑 Publisher
arXiv
💻 Env
General GUI
🔑 Keywords
TLDR

GUI-Eyes frames GUI grounding as active perception, letting the agent learn when and how to call tools such as cropping and zooming inside a two-stage reasoning process. It pairs that policy with a spatially continuous reward for tool use and reaches 44.8% grounding accuracy on ScreenSpot-Pro using only 3k labeled samples.

Open paper arXiv Edit on GitHub Report issue
Related papers