SSL: Sweet Spot Learning for Differentiated Guidance in Agentic Optimization

Jinyang Wu , Changpeng Yang , Yuhao Shen , Fangzhi Xu , Bolin Ni , Chonghua Liao , Yuchen Liu , Hongzhen Wang , Shuai Nie , Shuai Zhang , Haoran Luo , Jiaming Xu

🏛 Institutions: Tsinghua , Xiaomi , ZJU , NTU , Institute of Automation , CAS
📅 Date: January 30, 2026
📑 Publisher: arXiv
💻 Env: General GUI
🔑 Keywords: reinforcement learning tiered rewards reward shaping GUI grounding planning SSL

TLDR

SSL replaces binary verifier rewards with progressively amplified tiered rewards that distinguish higher- and lower-quality successful trajectories. Across GUI perception, short- and long-horizon planning, and reasoning benchmarks, it improves optimization stability and reaches up to 2.5x better sample efficiency than binary-reward baselines.

Open paper arXiv Report issue