GUI Agents Papers
Star · 821

Web-Shepherd: Advancing PRMs for Reinforcing Web Agents

Hyungjoo Chae , Sunghwan Kim , Junhee Cho , Seungone Kim , Seungjun Moon , Gyeom Hwangbo , Dongha Lim , Minjin Kim , Yeonjun Hwang , Minju Gwak , Dongwook Choi , Minseok Kang , Gwanhoon Im , ByeongUng Cho , Hyojun Kim , Jun Hee Han , Taeyoon Kwon , Minju Kim , Beong-woo Kwak , Dongjin Kang , Jinyoung Yeo

🏛 Institutions
Yonsei University , CMU
📅 Date
May 21, 2025
📑 Publisher
NeurIPS 2025 (Spotlight)
💻 Env
Web
🔑 Keywords
TLDR

Web-Shepherd introduces the first process reward model specialized for web navigation, along with the WebPRM Collection of 40K step-level preference pairs and the WebRewardBench meta-evaluation benchmark. It substantially outperforms generic frontier-model verifiers on web trajectories while reducing verification cost enough for both RL training and test-time use.

Open paper arXiv Report issue
Related papers (24)