GUI-R1: A Generalist R1-Style Vision-Language Action Model for GUI Agents
Run Luo, Lu Wang, Wanwei He, Longze Chen, Jiaming Li, Min Yang, Xiaobo Xia
- 🏛 Institutions
- Shenzhen Institute of Advanced Technology, CAS, University of Chinese Academy of Sciences, NUS
- 📅 Date
- April 14, 2025
- 📑 Publisher
- arXiv
- 💻 Env
- Desktop Mobile Web
- 🔑 Keywords
TLDR
GUI-R1 applies R1-style reinforcement learning to GUI action modeling by training a vision-language agent with unified action-space rules across Windows, Linux, macOS, Android, and Web. Using only a small curated cross-platform dataset, it reports stronger performance than prior methods across eight benchmarks and highlights RL's data-efficiency benefits for GUI agents.
Related papers
- Don't Act Blindly: Robust GUI Automation via Action-Effect Verification and Self-CorrectionApril 7, 2026 · ACL 2026
- Generalization in Online Reinforcement Learning for Mobile AgentsMarch 8, 2026 · arXiv
- AgentCPM‑GUI: Building Mobile‑Use Agents with Reinforcement Fine‑TuningJune 2, 2025 · EMNLP 2025 System Demonstrations
- ARPO:End-to-End Policy Optimization for GUI Agents with Experience ReplayMay 22, 2025 · arXiv
- UI-R1: Enhancing Efficient Action Prediction of GUI Agents by Reinforcement LearningMarch 27, 2025 · arXiv
- CGL: Advancing Continual GUI Learning via Reinforcement Fine-TuningMarch 3, 2026 · arXiv