GUI Agents Papers
Star · 821

ARPO:End-to-End Policy Optimization for GUI Agents with Experience Replay

Fanbin Lu , Zhisheng Zhong , Shu Liu , Chi-Wing Fu , Jiaya Jia

🏛 Institutions
CUHK , SmartMore , HKUST
📅 Date
May 22, 2025
📑 Publisher
arXiv
💻 Env
Desktop
🔑 Keywords
TLDR

ARPO studies end-to-end reinforcement learning for GUI agents in long-horizon desktop environments where sparse rewards and rollout cost make optimization difficult. It augments GRPO with replayed successful experience and task selection, establishing a stronger OSWorld training baseline than prior policy-optimization approaches.

Open paper arXiv Report issue
Related papers (24)