Efficient Multi-turn RL for GUI Agents via Decoupled Training and Adaptive Data Curation
Pengxiang Li , Zechen Hu , Zirui Shang , Jingrong Wu , Yang Liu , Hui Liu , Zhi Gao , Chenrui Shi , Bofei Zhang , Zihao Zhang , Xiaochuan Shi , Zedong YU , Yuwei Wu , Xinxiao Wu , Yunde Jia , Liuyu Xiang , Zhaofeng He , Qing Li
- 🏛 Institutions
- Beijing Institute of Technology , State Key Laboratory of General Artificial Intelligence , BIGAI , DataCanvas , Beijing University of Posts and Telecommunications , Shenzhen MSU-BIT University
- 📅 Date
- September 28, 2025
- 📑 Publisher
- arXiv
- 💻 Env
- Desktop Mobile
- 🔑 Keywords
TLDR
DART is a decoupled RL training framework for GUI agents that separates environment execution, rollout service, data management, and training into asynchronous modules to improve multi-turn learning efficiency. It pairs that system design with adaptive data curation, including difficulty-aware rollout control and high-entropy step selection, and substantially improves OSWorld performance over the base model.
Related papers (24)
- ComputerRL: Scaling End-to-End Online Reinforcement Learning for Computer Use AgentsAugust 19, 2025 · ICLR 2026 (Poster)
- From Off-Policy to On-Policy: Enhancing GUI Agents via Bi-level Expert-to-Policy AssimilationJanuary 9, 2026 · arXiv
- ZeroGUI: Automating Online GUI Learning at Zero Human CostMay 29, 2025 · arXiv
- GUI-R1: A Generalist R1-Style Vision-Language Action Model for GUI AgentsApril 14, 2025 · arXiv
- MobileGym: A Verifiable and Highly Parallel Simulation Platform for Mobile GUI Agent ResearchMay 25, 2026 · arXiv
- ClawGUI: A Unified Framework for Training, Evaluating, and Deploying GUI AgentsApril 13, 2026 · arXiv
- Android Coach: Improve Online Agentic Training Efficiency with Single State Multiple ActionsApril 8, 2026 · arXiv
- Don't Act Blindly: Robust GUI Automation via Action-Effect Verification and Self-CorrectionApril 7, 2026 · ACL 2026
- IntentScore: Intent-Conditioned Action Evaluation for Computer-Use AgentsApril 6, 2026 · arXiv
- GUIDE: Resolving Domain Bias in GUI Agents through Real-Time Web Video Retrieval and Plug-and-Play AnnotationMarch 27, 2026 · arXiv
- UI-Voyager: A Self-Evolving GUI Agent Learning via Failed ExperienceMarch 25, 2026 · arXiv
- Generalization in Online Reinforcement Learning for Mobile AgentsMarch 8, 2026 · arXiv
- Adaptive Milestone Reward for GUI AgentsFebruary 12, 2026 · arXiv
- UI-Mem: Self-Evolving Experience Memory for Online Reinforcement Learning in Mobile GUI AgentsFebruary 5, 2026 · arXiv
- EvoCUA: Evolving Computer Use Agents via Learning from Scalable Synthetic ExperienceJanuary 22, 2026 · arXiv
- CaMeLs Can Use Computers Too: System-level Security for Computer Use AgentsJanuary 14, 2026 · arXiv
- SmartSnap: Proactive Evidence Seeking for Self-Verifying AgentsDecember 26, 2025 · arXiv
- Watch and Learn: Learning to Use Computers from Online VideosOctober 6, 2025 · CVPR 2026
- Scaling Agents for Computer UseOctober 2, 2025 · arXiv
- Evolving in Tasks: Empowering the Multi-modality Large Language Model as the Computer Use AgentAugust 6, 2025 · arXiv
- CoAct-1: Computer-using Multi-Agent System with Coding ActionsAugust 5, 2025 · ICLR 2026 (Poster)
- DPO Learning with LLMs-Judge Signal for Computer Use AgentsJune 3, 2025 · arXiv
- AgentCPM‑GUI: Building Mobile‑Use Agents with Reinforcement Fine‑TuningJune 2, 2025 · EMNLP 2025 System Demonstrations
- LiteCUA: Computer as MCP Server for Computer-Use Agent on AIOSMay 24, 2025 · arXiv