BrowserArena: Evaluating LLM Agents on Real-World Web Navigation Tasks

Sagnik Anupam , Davis Brown , Shuo Li , Eric Wong , Hamed Hassani , Osbert Bastani

🏛 Institutions: University of Pennsylvania
📅 Date: October 2, 2025
📑 Publisher: arXiv
💻 Env: Web
🔑 Keywords: live web evaluation head-to-head ranking step-level human feedback failure modes BrowserArena

TLDR

BrowserArena is a live open-web evaluation platform that compares web agents on user-submitted tasks with Arena-style head-to-head judgments and step-level human annotations. It surfaces recurring real-world failure modes such as captcha resolution, pop-up removal, and direct URL navigation, and uses targeted datasets to study how different models handle them.

Open paper arXiv Report issue