GUI Agents Papers
Star · 751

See, Point, Refine: Multi-Turn Approach to GUI Grounding with Visual Feedback

Himangi Mittal, Gaurav Mittal, Nelson Daniel Troncoso, Yu Hu

🏛 Institutions
Microsoft, CMU
📅 Date
April 14, 2026
📑 Publisher
arXiv
💻 Env
General GUI
🔑 Keywords
TLDR

See-Point-Refine reframes GUI grounding as an iterative loop where the agent points, observes visual feedback, and refines, targeting editing-level grounding in dense coding interfaces that require sub-pixel accuracy. The multi-turn formulation improves grounding precision over single-shot baselines by recovering from initial errors using closed-loop visual evidence.

Open paper arXiv Edit on GitHub Report issue
Related papers