Succeed or Learn Slowly: Sample Efficient Off-Policy Reinforcement Learning for Mobile App Control

Georgios Papoudakis , Thomas Coste , Jianye Hao , Jun Wang , Kun Shao

🏛 Institutions: Huawei Noah’s Ark Lab , UCL
📅 Date: September 1, 2025
📑 Publisher: NeurIPS 2025 (Poster)
💻 Env: Mobile
🔑 Keywords: off-policy reinforcement learning positive-sample updates negative-sample regularization successful transition replay AndroidWorld SoLS STR

TLDR

SoLS is an off-policy RL algorithm for mobile app control that updates directly on successful samples but applies conservative regularized updates on negative ones to avoid policy degradation in sparse-reward settings. With Successful Transition Replay, it improves AndroidWorld performance substantially while using far less compute than GPT-4o-based baselines.

Open paper arXiv Report issue