GUI Agents Papers
Star · 751

Android Coach: Improve Online Agentic Training Efficiency with Single State Multiple Actions

Guo Gan, Yuxuan Ding, Cong Chen, Yuwei Ren, Yin Huang, Hong Zhou

🏛 Institutions
Unknown
📅 Date
April 8, 2026
📑 Publisher
arXiv
💻 Env
Mobile
🔑 Keywords
TLDR

Android Coach shifts online RL training from Single State Single Action to Single State Multiple Actions by learning a critic that estimates action values and integrating a process reward model with group-wise advantage estimation. It improves UI-TARS-1.5-7B by 7.5% on AndroidLab and 8.3% on AndroidWorld with 1.4x higher training efficiency than PPO and GRPO.

Open paper arXiv Edit on GitHub Report issue
Related papers