Web-Shepherd: Advancing PRMs for Reinforcing Web Agents

Hyungjoo Chae , Sunghwan Kim , Junhee Cho , Seungone Kim , Seungjun Moon , Gyeom Hwangbo , Dongha Lim , Minjin Kim , Yeonjun Hwang , Minju Gwak , Dongwook Choi , Minseok Kang , Gwanhoon Im , ByeongUng Cho , Hyojun Kim , Jun Hee Han , Taeyoon Kwon , Minju Kim , Beong-woo Kwak , Dongjin Kang , Jinyoung Yeo

🏛 Institutions: Yonsei University , CMU
📅 Date: May 21, 2025
📑 Publisher: NeurIPS 2025 (Spotlight)
💻 Env: Web
🔑 Keywords: model dataset benchmark reward model WebRewardBench Web-Shepherd

TLDR

Web-Shepherd introduces the first process reward model specialized for web navigation, along with the WebPRM Collection of 40K step-level preference pairs and the WebRewardBench meta-evaluation benchmark. It substantially outperforms generic frontier-model verifiers on web trajectories while reducing verification cost enough for both RL training and test-time use.

Open paper arXiv Report issue