Improved GUI Grounding via Iterative Narrowing
Anthony Nguyen
- 🏛 Institutions
- Algoma University
- 📅 Date
- November 18, 2024
- 📑 Publisher
- arXiv
- 💻 Env
- Desktop Mobile Web
- 🔑 Keywords
TLDR
Iterative Narrowing is a visual-prompting framework for GUI grounding that repeatedly zooms into smaller image regions to refine predictions. The paper shows that this simple test-time strategy improves both general and fine-tuned VLMs on one-shot grounding across multiple UI platforms.
Related papers
- VenusBench-GD: A Comprehensive Multi-Platform GUI Benchmark for Diverse Grounding TasksDecember 18, 2025 · arXiv
- ScaleTrack: Scaling and back-tracking Automated GUI AgentsMay 1, 2025 · arXiv
- UI-TARS: Pioneering Automated GUI Interaction with Native AgentsJanuary 21, 2025 · arXiv
- Ponder & Press: Advancing Visual GUI Agent towards General Computer ControlDecember 2, 2024 · Findings of ACL 2025
- OS-ATLAS: A Foundation Action Model for Generalist GUI AgentsOctober 30, 2024 · ICLR 2025 (Spotlight)
- TinyClick: Single-Turn Agent for Empowering GUI AutomationOctober 9, 2024 · INTERSPEECH 2025