Efficient Multi-turn RL for GUI Agents via Decoupled Training and Adaptive Data Curation

Pengxiang Li , Zechen Hu , Zirui Shang , Jingrong Wu , Yang Liu , Hui Liu , Zhi Gao , Chenrui Shi , Bofei Zhang , Zihao Zhang , Xiaochuan Shi , Zedong YU , Yuwei Wu , Xinxiao Wu , Yunde Jia , Liuyu Xiang , Zhaofeng He , Qing Li

🏛 Institutions: Beijing Institute of Technology , State Key Laboratory of General Artificial Intelligence , BIGAI , DataCanvas , Beijing University of Posts and Telecommunications , Shenzhen MSU-BIT University
📅 Date: September 28, 2025
📑 Publisher: arXiv
💻 Env: Desktop Mobile
🔑 Keywords: reinforcement learning decoupled training adaptive data curation asynchronous modules OSWorld DART

TLDR

DART is a decoupled RL training framework for GUI agents that separates environment execution, rollout service, data management, and training into asynchronous modules to improve multi-turn learning efficiency. It pairs that system design with adaptive data curation, including difficulty-aware rollout control and high-entropy step selection, and substantially improves OSWorld performance over the base model.

Open paper arXiv Report issue