Adaptive Milestone Reward for GUI Agents
Congmin Zheng, Xiaoyun Mo, Xinbei Ma, Qiqiang Lin, Yin Zhao, Jiachen Zhu, Xingyu Lou, Jun Wang, Zhaoxiang Wang, Weiwen Liu, Zhuosheng Zhang, Yong Yu, Weinan Zhang
- 🏛 Institutions
- SJTU, OPPO Research Institute
- 📅 Date
- February 12, 2026
- 📑 Publisher
- arXiv
- 💻 Env
- Mobile
- 🔑 Keywords
TLDR
ADMIRE is a reinforcement-learning reward design for GUI agents that distills adaptive, verifiable milestones from successful trajectories and pairs them with asymmetric credit assignment. It improves AndroidWorld performance by more than 10 absolute points and transfers to other RL algorithms and environments.
Related papers
- UI-Voyager: A Self-Evolving GUI Agent Learning via Failed ExperienceMarch 25, 2026 · arXiv
- SSL: Sweet Spot Learning for Differentiated Guidance in Agentic OptimizationJanuary 30, 2026 · arXiv
- ClawGUI: A Unified Framework for Training, Evaluating, and Deploying GUI AgentsApril 13, 2026 · arXiv
- Android Coach: Improve Online Agentic Training Efficiency with Single State Multiple ActionsApril 8, 2026 · arXiv
- Don't Act Blindly: Robust GUI Automation via Action-Effect Verification and Self-CorrectionApril 7, 2026 · ACL 2026
- HATS: Hardness-Aware Trajectory Synthesis for GUI AgentsMarch 12, 2026 · CVPR 2026