Test‑Time Reinforcement Learning for GUI Grounding via Region Consistency
Yong Du, Yuchen Yan, Fei Tang, Zhengxi Lu, Chang Zong, Weiming Lu, Shengpei Jiang, Yongliang Shen
- 🏛 Institutions
- ZJU, Central South University, Zhejiang University of Science and Technology, SF Technology
- 📅 Date
- August 7, 2025
- 📑 Publisher
- AAAI 2026
- 💻 Env
- Desktop Mobile Web
- 🔑 Keywords
TLDR
This paper uses consistency across multiple grounding predictions as a test-time signal for GUI grounding. GUI-RC aggregates sampled outputs into consensus regions without extra training, while GUI-RCPO turns the same signal into rewards for test-time policy optimization on unlabeled data, improving ScreenSpot results across several model families.
Related papers
- Mobile-Agent-v3.5: Multi-platform Fundamental GUI AgentsFebruary 15, 2026 · arXiv
- Agentic Test-Time Scaling for WebAgentsFebruary 12, 2026 · arXiv
- VenusBench-GD: A Comprehensive Multi-Platform GUI Benchmark for Diverse Grounding TasksDecember 18, 2025 · arXiv
- OS-Oracle: A Comprehensive Framework for Cross-Platform GUI Critic ModelsDecember 18, 2025 · arXiv
- Surfer 2: The Next Generation of Cross-Platform Computer Use AgentsOctober 22, 2025 · arXiv
- Scaling Agents for Computer UseOctober 2, 2025 · arXiv