GUI Agents Papers
Star · 821

GUI-R1: A Generalist R1-Style Vision-Language Action Model for GUI Agents

Run Luo , Lu Wang , Wanwei He , Longze Chen , Jiaming Li , Min Yang , Xiaobo Xia

🏛 Institutions
Shenzhen Institute of Advanced Technology , CAS , University of Chinese Academy of Sciences , NUS
📅 Date
April 14, 2025
📑 Publisher
arXiv
💻 Env
Desktop Mobile Web
🔑 Keywords
TLDR

GUI-R1 applies R1-style reinforcement learning to GUI action modeling by training a vision-language agent with unified action-space rules across Windows, Linux, macOS, Android, and Web. Using only a small curated cross-platform dataset, it reports stronger performance than prior methods across eight benchmarks and highlights RL's data-efficiency benefits for GUI agents.

Open paper arXiv Report issue
Related papers (24)