WebArena: A Realistic Web Environment for Building Autonomous Agents
Shuyan Zhou, Frank F. Xu, Hao Zhu, Xuhui Zhou, Robert Lo, Abishek Sridhar, Xianyi Cheng, Tianyue Ou, Yonatan Bisk, Daniel Fried, Uri Alon, Graham Neubig
- 🏛 Institutions
- CMU, Inspired Cognition
- 📅 Date
- July 25, 2023
- 📑 Publisher
- NeurIPS 2024 (Oral)
- 💻 Env
- Web
- 🔑 Keywords
TLDR
Introduces WebArena, a realistic and reproducible web environment built from fully functional sites across several common domains. It helped establish the modern web-agent evaluation stack by pairing realistic websites, external tools and knowledge sources, and long-horizon benchmark tasks with functional correctness checks.
Related papers
- When Users Change Their Mind: Evaluating Interruptible Agents in Long-Horizon Web NavigationApril 1, 2026 · arXiv
- WebArena-Infinity: Generating Browser Environments with Verifiable Tasks at ScaleMarch 2026 · Blog Post
- WebShop: Towards Scalable Real-World Web Interaction with Grounded Language AgentsJuly 31, 2022 · NeurIPS 2022
- EntWorld: A Holistic Environment and Benchmark for Verifiable Enterprise GUI AgentsJanuary 25, 2026 · arXiv
- ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific WorkflowsMay 26, 2025 · ICLR 2026 (Poster)
- Odysseys: Benchmarking Web Agents on Realistic Long Horizon TasksApril 27, 2026 · arXiv