GUI-Spotlight: Adaptive Iterative Focus Refinement for Enhanced GUI Visual Grounding

Bin Lei , Nuo Xu , Ali Payani , Mingyi Hong , Chunhua Liao , Yu Cao , Caiwen Ding

🏛 Institutions: University of Minnesota , Cisco Research , Lawrence Livermore National Labs
📅 Date: October 5, 2025
📑 Publisher: arXiv
💻 Env: General GUI
🔑 Keywords: GUI grounding image-grounded reasoning iterative focus refinement ScreenSpot-pro GUI-Spotlight

TLDR

GUI-Spotlight is a GUI grounding model that performs image-grounded reasoning by iteratively invoking specialized tools to narrow attention to the relevant screen region. Trained with only 18.5K examples, it reaches 52.8% accuracy on ScreenSpot-Pro, outperforming prior 7B grounding models trained on much larger datasets.

Open paper arXiv Report issue