GUI Agents Papers
Star · 751

Web-Shepherd: Advancing PRMs for Reinforcing Web Agents

Hyungjoo Chae, Sunghwan Kim, Junhee Cho, Seungone Kim, Seungjun Moon, Gyeom Hwangbo, Dongha Lim, Minjin Kim, Yeonjun Hwang, Minju Gwak, Dongwook Choi, Minseok Kang, Gwanhoon Im, ByeongUng Cho, Hyojun Kim, Jun Hee Han, Taeyoon Kwon, Minju Kim, Beong-woo Kwak, Dongjin Kang, Jinyoung Yeo

🏛 Institutions
Yonsei University, CMU
📅 Date
May 21, 2025
📑 Publisher
NeurIPS 2025 (Spotlight)
💻 Env
Web
🔑 Keywords
TLDR

Web-Shepherd introduces the first process reward model specialized for web navigation, along with the WebPRM Collection of 40K step-level preference pairs and the WebRewardBench meta-evaluation benchmark. It substantially outperforms generic frontier-model verifiers on web trajectories while reducing verification cost enough for both RL training and test-time use.

Open paper arXiv Edit on GitHub Report issue
Related papers