ReGUIDE: Data Efficient GUI Grounding via Spatial Reasoning and Search

Hyunseok Lee , Jeonghoon Kim , Beomjun Kim , Jihoon Tack , Chansong Jo , Jaehong Lee , Cheonbok Park , Sookyo In , Jinwoo Shin , Kang Min Yoo

🏛 Institutions: KAIST , NAVER Cloud
📅 Date: May 21, 2025
📑 Publisher: arXiv
💻 Env: Web
🔑 Keywords: GUI grounding spatial reasoning data efficiency test-time scaling ReGUIDE

TLDR

ReGUIDE improves web GUI grounding under limited data by combining self-generated reasoning, spatially aware criticism, and test-time spatial search. It substantially outperforms baselines while using only a tiny fraction of the training data required by prior web-grounding approaches.

Open paper arXiv Report issue

Related papers (24)

Zoom in, Click out: Unlocking and Evaluating the Potential of Zooming for GUI Grounding

December 5, 2025 · arXiv
OpeFlo: Automated UX Evaluation via Simulated Human Web Interaction with GUI Grounding

February 25, 2026 · arXiv
WebTestPilot: Agentic End-to-End Web Testing against Natural Language Specification by Inferring Oracles with Symbolized GUI Elements

February 12, 2026 · arXiv
Agentic Test-Time Scaling for WebAgents

February 12, 2026 · arXiv
VenusBench-GD: A Comprehensive Multi-Platform GUI Benchmark for Diverse Grounding Tasks

December 18, 2025 · arXiv
Test‑Time Reinforcement Learning for GUI Grounding via Region Consistency

August 7, 2025 · AAAI 2026
WebAgent-R1: Training Web Agents via End-to-End Multi-Turn Reinforcement Learning

May 22, 2025 · EMNLP 2025 (Poster)
ScaleTrack: Scaling and back-tracking Automated GUI Agents

May 1, 2025 · arXiv
GUI-R1: A Generalist R1-Style Vision-Language Action Model for GUI Agents

April 14, 2025 · arXiv
UI-TARS: Pioneering Automated GUI Interaction with Native Agents

January 21, 2025 · arXiv
Iris: Breaking GUI Complexity with Adaptive Focus and Self-Refining

December 13, 2024 · arXiv
Ponder & Press: Advancing Visual GUI Agent towards General Computer Control

December 2, 2024 · Findings of ACL 2025
Improved GUI Grounding via Iterative Narrowing

November 18, 2024 · arXiv
OS-ATLAS: A Foundation Action Model for Generalist GUI Agents

October 30, 2024 · ICLR 2025 (Spotlight)
TinyClick: Single-Turn Agent for Empowering GUI Automation

October 9, 2024 · INTERSPEECH 2025
Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents

October 7, 2024 · ICLR 2025 (Oral)
Dual-View Visual Contextualization for Web Navigation

February 6, 2024 · CVPR 2024 (Poster)
SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents

January 17, 2024 · ACL 2024
GUI-C²: Coarse-to-Fine GUI Grounding via Difficulty-Aware Reinforcement Learning

May 29, 2026 · arXiv
UI-Zoomer: Uncertainty-Driven Adaptive Zoom-In for GUI Grounding

April 15, 2026 · arXiv
GUI-Perturbed: Domain Randomization Reveals Systematic Brittleness in GUI Grounding Models

April 15, 2026 · arXiv
See, Point, Refine: Multi-Turn Approach to GUI Grounding with Visual Feedback

April 14, 2026 · arXiv
What's Missing in Screen-to-Action? Towards a UI-in-the-Loop Paradigm for Multimodal GUI Reasoning

April 8, 2026 · Findings of ACL 2026
Towards GUI Agents: Vision-Language Diffusion Models for GUI Grounding

March 27, 2026 · CVPR 2026