WebAgent-R1: Training Web Agents via End-to-End Multi-Turn Reinforcement Learning

Zhepei Wei , Wenlin Yao , Yao Liu , Weizhi Zhang , Qin Lu , Liang Qiu , Changlong Yu , Puyang Xu , Chao Zhang , Bing Yin , Hyokun Yun , Lihong Li

🏛 Institutions: University of Virginia , Amazon , Georgia Tech
📅 Date: May 22, 2025
📑 Publisher: EMNLP 2025 (Poster)
💻 Env: Web
🔑 Keywords: reinforcement learning multi-turn interaction WebArena-Lite test-time scaling WebAgent-R1

TLDR

WebAgent-R1 studies end-to-end multi-turn reinforcement learning for web agents rather than single-turn reasoning tasks. It learns directly from online browser interactions with binary success rewards and substantially improves small open models on WebArena-Lite, surpassing prior methods and some proprietary baselines.

Open paper arXiv Report issue