Zoom in, Click out: Unlocking and Evaluating the Potential of Zooming for GUI Grounding
Zhiyuan Jiang , Shenghao Xie , Wenyi Li , Wenqiang Zu , Peihang Li , Jiahao Qiu , Siqi Pei , Lei Ma , Tiejun Huang , Mengdi Wang , Shilong Liu
- 🏛 Institutions
- Xi’an Jiaotong University , Princeton , PKU , University of Chinese Academy of Sciences , HKU , Michigan State University
- 📅 Date
- December 5, 2025
- 📑 Publisher
- arXiv
- 💻 Env
- General GUI
- 🔑 Keywords
TLDR
This paper studies zooming as a test-time prior for GUI grounding and proposes ZoomClick, which decides when to zoom, how far to zoom, and when to return to the original view during localization. It also introduces GUIZoom-Bench and reports stronger grounding results across several mainstream benchmarks.
Related papers (24)
- UI-Zoomer: Uncertainty-Driven Adaptive Zoom-In for GUI GroundingApril 15, 2026 · arXiv
- Zoom to Essence: Trainless GUI Grounding by Inferring upon Interface ElementsMarch 15, 2026 · arXiv
- Trifuse: Enhancing Attention-Based GUI Grounding via Multimodal FusionFebruary 6, 2026 · arXiv
- MVP: Multiple View Prediction Improves GUI GroundingDecember 9, 2025 · arXiv
- ReGUIDE: Data Efficient GUI Grounding via Spatial Reasoning and SearchMay 21, 2025 · arXiv
- Improved GUI Grounding via Iterative NarrowingNovember 18, 2024 · arXiv
- STaR-KV: Spatio-Temporal Adaptive Re-weighting for KV Cache Compression in GUI Vision-Language ModelsJune 1, 2026 · arXiv
- GUI-C²: Coarse-to-Fine GUI Grounding via Difficulty-Aware Reinforcement LearningMay 29, 2026 · arXiv
- GUI-Perturbed: Domain Randomization Reveals Systematic Brittleness in GUI Grounding ModelsApril 15, 2026 · arXiv
- See, Point, Refine: Multi-Turn Approach to GUI Grounding with Visual FeedbackApril 14, 2026 · arXiv
- What's Missing in Screen-to-Action? Towards a UI-in-the-Loop Paradigm for Multimodal GUI ReasoningApril 8, 2026 · Findings of ACL 2026
- Towards GUI Agents: Vision-Language Diffusion Models for GUI GroundingMarch 27, 2026 · CVPR 2026
- AdaZoom-GUI: Adaptive Zoom-based GUI Grounding with Instruction RefinementMarch 18, 2026 · arXiv
- Moving Beyond Sparse Grounding with Complete Screen Parsing SupervisionFebruary 15, 2026 · arXiv
- POINTS-GUI-G: GUI-Grounding JourneyFebruary 6, 2026 · arXiv
- SSL: Sweet Spot Learning for Differentiated Guidance in Agentic OptimizationJanuary 30, 2026 · arXiv
- Darwinian Memory: A Training-Free Self-Regulating Memory System for GUI Agent EvolutionJanuary 30, 2026 · arXiv
- GAIA: A Data Flywheel System for Training GUI Test-Time Scaling Critic ModelsJanuary 26, 2026 · arXiv
- V2P: Visual Attention Calibration for GUI Grounding via Background Suppression and Center PeakingJanuary 11, 2026 · arXiv
- Beyond Clicking: A Step Towards Generalist GUI Grounding via Text DraggingNovember 7, 2025 · arXiv
- GUI-Spotlight: Adaptive Iterative Focus Refinement for Enhanced GUI Visual GroundingOctober 5, 2025 · arXiv
- GUI-KV: Efficient GUI Agents via KV Cache with Spatio-Temporal AwarenessOctober 1, 2025 · arXiv
- UI-AGILE: Advancing GUI Agents with Effective Reinforcement Learning and Precise Inference-Time GroundingJuly 29, 2025 · CVPR 2026 Findings
- GUI-G1: Understanding R1-Zero-Like Training for Visual Grounding in GUI AgentsMay 21, 2025 · NeurIPS 2025 (Poster)