GUI Agents Papers
Star · 821

Succeed or Learn Slowly: Sample Efficient Off-Policy Reinforcement Learning for Mobile App Control

Georgios Papoudakis , Thomas Coste , Jianye Hao , Jun Wang , Kun Shao

🏛 Institutions
Huawei Noah’s Ark Lab , UCL
📅 Date
September 1, 2025
📑 Publisher
NeurIPS 2025 (Poster)
💻 Env
Mobile
🔑 Keywords
TLDR

SoLS is an off-policy RL algorithm for mobile app control that updates directly on successful samples but applies conservative regularized updates on negative ones to avoid policy degradation in sparse-reward settings. With Successful Transition Replay, it improves AndroidWorld performance substantially while using far less compute than GPT-4o-based baselines.

Open paper arXiv Report issue
Related papers (24)