GuirlVG: Incentivize GUI Visual Grounding via Empirical Exploration on Reinforcement Learning
Weitai Kang, Bin Lei, Gaowen Liu, Caiwen Ding, Yan Yan
- 🏛 Institutions
- University of Illinois Chicago, University of Minnesota, Cisco Research
- 📅 Date
- August 6, 2025
- 📑 Publisher
- ICLR 2026 (Poster)
- 💻 Env
- Desktop Mobile Web
- 🔑 Keywords
TLDR
GuirlVG studies how to make reinforcement fine-tuning work for GUI visual grounding instead of naively applying standard rule-based RL. It systematically tunes reward design, prediction format, and training setup, adds an Adversarial KL Factor for stabilization, and reports stronger ScreenSpot-family results with only 5.2K training samples.
Related papers
- UI-Venus Technical Report: Building High-performance UI Agents with RFTAugust 14, 2025 · arXiv
- Ponder & Press: Advancing Visual GUI Agent towards General Computer ControlDecember 2, 2024 · Findings of ACL 2025
- TinyClick: Single-Turn Agent for Empowering GUI AutomationOctober 9, 2024 · INTERSPEECH 2025
- SeeClick: Harnessing GUI Grounding for Advanced Visual GUI AgentsJanuary 17, 2024 · ACL 2024
- Mobile-Agent-v3.5: Multi-platform Fundamental GUI AgentsFebruary 15, 2026 · arXiv
- VenusBench-GD: A Comprehensive Multi-Platform GUI Benchmark for Diverse Grounding TasksDecember 18, 2025 · arXiv