GUI-R1: A Generalist R1-Style Vision-Language Action Model for GUI Agents

Run Luo , Lu Wang , Wanwei He , Longze Chen , Jiaming Li , Min Yang , Xiaobo Xia

🏛 Institutions: Shenzhen Institute of Advanced Technology , CAS , University of Chinese Academy of Sciences , NUS
📅 Date: April 14, 2025
📑 Publisher: arXiv
💻 Env: Desktop Mobile Web
🔑 Keywords: reinforcement learning unified action space GRPO data efficiency GUI-R1

TLDR

GUI-R1 applies R1-style reinforcement learning to GUI action modeling by training a vision-language agent with unified action-space rules across Windows, Linux, macOS, Android, and Web. Using only a small curated cross-platform dataset, it reports stronger performance than prior methods across eight benchmarks and highlights RL's data-efficiency benefits for GUI agents.

Open paper arXiv Report issue