GUI Agents Papers
Star · 821

Android Coach: Improve Online Agentic Training Efficiency with Single State Multiple Actions

Guo Gan , Yuxuan Ding , Cong Chen , Yuwei Ren , Yin Huang , Hong Zhou

🏛 Institutions
Unknown
📅 Date
April 8, 2026
📑 Publisher
arXiv
💻 Env
Mobile
🔑 Keywords
TLDR

Android Coach shifts online RL training from Single State Single Action to Single State Multiple Actions by learning a critic that estimates action values and integrating a process reward model with group-wise advantage estimation. It improves UI-TARS-1.5-7B by 7.5% on AndroidLab and 8.3% on AndroidWorld with 1.4x higher training efficiency than PPO and GRPO.

Open paper arXiv Report issue
Related papers (24)