Why Do LLM-based Web Agents Fail? A Hierarchical Planning Perspective
Mohamed Aghzal, Gregory J. Stein, Ziyu Yao
- 🏛 Institutions
- George Mason University
- 📅 Date
- March 15, 2026
- 📑 Publisher
- arXiv
- 💻 Env
- Web
- 🔑 Keywords
TLDR
This paper analyzes web-agent failures through a three-layer hierarchy of high-level planning, low-level execution, and replanning rather than relying only on end-to-end success. It finds that structured PDDL plans improve strategic planning over natural-language plans, but that execution and grounding remain the dominant reliability bottlenecks.
Related papers
- SecureWebArena: A Holistic Security Evaluation Benchmark for LVLM-based Web AgentsOctober 11, 2025 · arXiv
- WebSuite: Systematically Evaluating Why Web Agents FailJune 1, 2024 · arXiv
- VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?April 9, 2024 · COLM 2024
- GPT-4V(ision) is a Generalist Web Agent, if GroundedJanuary 3, 2024 · ICML 2024
- Building Autonomous GUI Navigation via Agentic-Q Estimation and Step-Wise Policy OptimizationFebruary 14, 2026 · arXiv
- Agent S: An Open Agentic Framework that Uses Computers Like a HumanOctober 10, 2024 · ICLR 2025 (Poster)