Test‑Time Reinforcement Learning for GUI Grounding via Region Consistency

Yong Du , Yuchen Yan , Fei Tang , Zhengxi Lu , Chang Zong , Weiming Lu , Shengpei Jiang , Yongliang Shen

🏛 Institutions: ZJU , Central South University , Zhejiang University of Science and Technology , SF Technology
📅 Date: August 7, 2025
📑 Publisher: AAAI 2026
💻 Env: Desktop Mobile Web
🔑 Keywords: GUI-RC GUI-RCPO region consistency test-time scaling test-time reinforcement learning

TLDR

This paper uses consistency across multiple grounding predictions as a test-time signal for GUI grounding. GUI-RC aggregates sampled outputs into consensus regions without extra training, while GUI-RCPO turns the same signal into rewards for test-time policy optimization on unlabeled data, improving ScreenSpot results across several model families.

Open paper arXiv Report issue