AdaZoom-GUI: Adaptive Zoom-based GUI Grounding with Instruction Refinement
Siqi Pei, Liang Tang, Tiaonan Duan, Long Chen, Shuxian Li, Kaer Huang, Yanzhe Jing, Yiqiang Yan, Bo Zhang, Chenghao Jiang, Borui Zhang, Jiwen Lu
- 🏛 Institutions
- Lenovo Research
- 📅 Date
- March 18, 2026
- 📑 Publisher
- arXiv
- 💻 Env
- General GUI
- 🔑 Keywords
TLDR
AdaZoom-GUI targets two concrete GUI-grounding bottlenecks: ambiguous natural-language instructions and tiny UI elements in high-resolution screenshots. It combines instruction rewriting with a conditional second-stage zoom-in pass and reports state-of-the-art grounding performance among comparable model sizes.
Related papers
- UI-Zoomer: Uncertainty-Driven Adaptive Zoom-In for GUI GroundingApril 15, 2026 · arXiv
- Towards GUI Agents: Vision-Language Diffusion Models for GUI GroundingMarch 27, 2026 · CVPR 2026
- GUI-Perturbed: Domain Randomization Reveals Systematic Brittleness in GUI Grounding ModelsApril 15, 2026 · arXiv
- See, Point, Refine: Multi-Turn Approach to GUI Grounding with Visual FeedbackApril 14, 2026 · arXiv
- What's Missing in Screen-to-Action? Towards a UI-in-the-Loop Paradigm for Multimodal GUI ReasoningApril 8, 2026 · Findings of ACL 2026
- Zoom to Essence: Trainless GUI Grounding by Inferring upon Interface ElementsMarch 15, 2026 · arXiv