From Grounding to Planning: Benchmarking Bottlenecks in Web Agents
Segev Shlomov, Ben Wiesel, Aviad Sela, Ido Levy, Liane Galanti, Roy Abitbol
- 🏛 Institutions
- IBM
- 📅 Date
- September 3, 2024
- 📑 Publisher
- ECAI 2025
- 💻 Env
- Web
- 🔑 Keywords
TLDR
This paper refines Mind2Web into separate planning and grounding benchmarks to diagnose which component is actually limiting web-agent performance. Its analysis argues that planning, not grounding, is the dominant bottleneck, and shows that isolating grounding can already yield near-perfect element accuracy with current techniques.
Related papers
- Mind2Web: Towards a Generalist Agent for the WebJune 9, 2023 · NeurIPS 2023 Datasets and Benchmarks Track
- Odysseys: Benchmarking Web Agents on Realistic Long Horizon TasksApril 27, 2026 · arXiv
- WebForge: Breaking the Realism-Reproducibility-Scalability Trilemma in Browser Agent BenchmarkApril 13, 2026 · arXiv
- The Amazing Agent Race: Strong Tool Users, Weak NavigatorsApril 11, 2026 · arXiv
- ClawBench: Can AI Agents Complete Everyday Online Tasks?April 9, 2026 · arXiv
- GameWorld: Towards Standardized and Verifiable Evaluation of Multimodal Game AgentsApril 8, 2026 · arXiv