GUI Agents Papers
Star · 821

GUI-Eyes: Tool-Augmented Perception for Visual Grounding in GUI Agents

Chen Chen , Jiawei Shao , Dakuan Lu , Haoyi Hu , Xiangcheng Liu , Hantao Yao , Wu Liu

🏛 Institutions
USTC , Institute of Artificial Intelligence (TeleAI) , China Telecom , Shanghai Innovation Institute , SJTU
📅 Date
January 14, 2026
📑 Publisher
arXiv
💻 Env
General GUI
🔑 Keywords
TLDR

GUI-Eyes frames GUI grounding as active perception, letting the agent learn when and how to call tools such as cropping and zooming inside a two-stage reasoning process. It pairs that policy with a spatially continuous reward for tool use and reaches 44.8% grounding accuracy on ScreenSpot-Pro using only 3k labeled samples.

Open paper arXiv Report issue
Related papers (24)