GUI Agents Papers
Star · 751

GUI-R1: A Generalist R1-Style Vision-Language Action Model for GUI Agents

Run Luo, Lu Wang, Wanwei He, Longze Chen, Jiaming Li, Min Yang, Xiaobo Xia

🏛 Institutions
Shenzhen Institute of Advanced Technology, CAS, University of Chinese Academy of Sciences, NUS
📅 Date
April 14, 2025
📑 Publisher
arXiv
💻 Env
Desktop Mobile Web
🔑 Keywords
TLDR

GUI-R1 applies R1-style reinforcement learning to GUI action modeling by training a vision-language agent with unified action-space rules across Windows, Linux, macOS, Android, and Web. Using only a small curated cross-platform dataset, it reports stronger performance than prior methods across eight benchmarks and highlights RL's data-efficiency benefits for GUI agents.

Open paper arXiv Edit on GitHub Report issue
Related papers