ARPO:End-to-End Policy Optimization for GUI Agents with Experience Replay

Fanbin Lu , Zhisheng Zhong , Shu Liu , Chi-Wing Fu , Jiaya Jia

🏛 Institutions: CUHK , SmartMore , HKUST
📅 Date: May 22, 2025
📑 Publisher: arXiv
💻 Env: Desktop
🔑 Keywords: reinforcement learning experience replay GRPO task selection ARPO

TLDR

ARPO studies end-to-end reinforcement learning for GUI agents in long-horizon desktop environments where sparse rewards and rollout cost make optimization difficult. It augments GRPO with replayed successful experience and task selection, establishing a stronger OSWorld training baseline than prior policy-optimization approaches.

Open paper arXiv Report issue