The Art of Building Verifiers for Computer Use Agents

Corby Rosset , Pratyusha Sharma , Andrew Zhao , Miguel Gonzalez-Fernandez , Ahmed Awadallah

🏛 Institutions: MSR
📅 Date: April 5, 2026
📑 Publisher: arXiv
💻 Env: Web
🔑 Keywords: benchmark reward model verification CUAVerifierBench

TLDR

Presents lessons from building Universal Verifier for web agent trajectories, based on four principles: meaningful rubrics, separated process/outcome rewards, controllable vs. uncontrollable failure distinction, and divide-and-conquer context management. Reduces false positive rates to near zero compared to WebVoyager (45%+) and WebJudge (22%+).

Open paper arXiv Report issue