WorkArena: How Capable are Web Agents at Solving Common Knowledge Work Tasks?
Alexandre Drouin, Maxime Gasse, Massimo Caccia, Issam H. Laradji, Manuel Del Verme, Tom Marty, David Vazquez, Nicolas Chapados, Alexandre Lacoste
- 🏛 Institutions
- ServiceNow Research, Mila
- 📅 Date
- March 11, 2024
- 📑 Publisher
- ICML 2024
- 💻 Env
- Web
- 🔑 Keywords
TLDR
WorkArena is a remote-hosted benchmark of 33 enterprise knowledge-work tasks built on the ServiceNow platform for browser-based agents. The paper introduces BrowserGym alongside the benchmark and shows that current agents remain well short of reliable task automation, with a clear gap between open and closed models.
Related papers
- The BrowserGym Ecosystem for Web Agent ResearchDecember 6, 2024 · TMLR
- EntWorld: A Holistic Environment and Benchmark for Verifiable Enterprise GUI AgentsJanuary 25, 2026 · arXiv
- Odysseys: Benchmarking Web Agents on Realistic Long Horizon TasksApril 27, 2026 · arXiv
- WebForge: Breaking the Realism-Reproducibility-Scalability Trilemma in Browser Agent BenchmarkApril 13, 2026 · arXiv
- The Amazing Agent Race: Strong Tool Users, Weak NavigatorsApril 11, 2026 · arXiv
- ClawBench: Can AI Agents Complete Everyday Online Tasks?April 9, 2026 · arXiv