OpenClaw-RL: Train Any Agent Simply by Talking

Yinjie Wang , Xuyang Chen , Xiaolong Jin , Mengdi Wang , Ling Yang

🏛 Institutions: University of Chicago , Princeton University , Peking University
📅 Date: March 10, 2026
📑 Publisher: arXiv
💻 Env
🔑 Keywords: reinforcement learning agent training next-state signals process reward model on-policy distillation OpenClaw-RL

TLDR

OpenClaw-RL is an asynchronous RL framework that treats next-state signals from live interactions as a universal learning source. It combines scalar rewards from a process-reward judge with hindsight-guided on-policy distillation, and trains agents across conversations, terminals, GUI tasks, SWE, and tool use in one online loop.

Open paper arXiv Report issue