Efficient Multi-turn RL for GUI Agents via Decoupled Training and Adaptive Data Curation
Pengxiang Li, Zechen Hu, Zirui Shang, Jingrong Wu, Yang Liu, Hui Liu, Zhi Gao, Chenrui Shi, Bofei Zhang, Zihao Zhang, Xiaochuan Shi, Zedong YU, Yuwei Wu, Xinxiao Wu, Yunde Jia, Liuyu Xiang, Zhaofeng He, Qing Li
- 🏛 Institutions
- Beijing Institute of Technology, State Key Laboratory of General Artificial Intelligence, BIGAI, DataCanvas, Beijing University of Posts and Telecommunications, Shenzhen MSU-BIT University
- 📅 Date
- September 28, 2025
- 📑 Publisher
- arXiv
- 💻 Env
- Desktop Mobile
- 🔑 Keywords
TLDR
DART is a decoupled RL training framework for GUI agents that separates environment execution, rollout service, data management, and training into asynchronous modules to improve multi-turn learning efficiency. It pairs that system design with adaptive data curation, including difficulty-aware rollout control and high-entropy step selection, and substantially improves OSWorld performance over the base model.
Related papers
- ComputerRL: Scaling End-to-End Online Reinforcement Learning for Computer Use AgentsAugust 19, 2025 · ICLR 2026 (Poster)
- From Off-Policy to On-Policy: Enhancing GUI Agents via Bi-level Expert-to-Policy AssimilationJanuary 9, 2026 · arXiv
- ZeroGUI: Automating Online GUI Learning at Zero Human CostMay 29, 2025 · arXiv
- GUI-R1: A Generalist R1-Style Vision-Language Action Model for GUI AgentsApril 14, 2025 · arXiv
- ClawGUI: A Unified Framework for Training, Evaluating, and Deploying GUI AgentsApril 13, 2026 · arXiv
- Android Coach: Improve Online Agentic Training Efficiency with Single State Multiple ActionsApril 8, 2026 · arXiv