GUI Agents Papers
Star · 821

BrowserArena: Evaluating LLM Agents on Real-World Web Navigation Tasks

Sagnik Anupam , Davis Brown , Shuo Li , Eric Wong , Hamed Hassani , Osbert Bastani

🏛 Institutions
University of Pennsylvania
📅 Date
October 2, 2025
📑 Publisher
arXiv
💻 Env
Web
🔑 Keywords
TLDR

BrowserArena is a live open-web evaluation platform that compares web agents on user-submitted tasks with Arena-style head-to-head judgments and step-level human annotations. It surfaces recurring real-world failure modes such as captcha resolution, pop-up removal, and direct URL navigation, and uses targeted datasets to study how different models handle them.

Open paper arXiv Report issue
Related papers (24)