AI Planning Framework for LLM-Based Web Agents

🏛 Institutions: University of Haifa
📅 Date: March 13, 2026
📑 Publisher: arXiv
💻 Env: Web
🔑 Keywords: AI planning evaluation metrics WebArena trajectory analysis planning taxonomy

TLDR

This paper maps common LLM-based web-agent designs to classical planning paradigms such as BFS, best-first tree search, and DFS, then argues that trajectory-level metrics are needed alongside raw success rate. Using 794 human-labeled WebArena trajectories, it shows that different agent architectures optimize different dimensions of performance.

Open paper arXiv Report issue