WebGraphEval: Multi-Turn Trajectory Evaluation for Web Agents using Graph Representation
Yaoyao Qian, Yuanli Wang, Jinda Zhang, Yun Zong, Meixu Chen, Hanhan Zhou, Jindan Huang, Yifan Zeng, Xinyu Hu, Chan Hee Song, Danqing Zhang
- 🏛 Institutions
- Northeastern University, Boston University, University of Victoria, University of Minnesota, George Washington University, Tufts University, Oregon State University, University of Texas at San Antonio, OSU, PathOnAI.org
- 📅 Date
- October 22, 2025
- 📑 Publisher
- NeurIPS 2025 Workshop on Multi-Turn Interactions in Large Language Models
- 💻 Env
- Web
- 🔑 Keywords
TLDR
WebGraphEval evaluates web agents by converting many interaction trajectories into a unified weighted action graph instead of scoring only final success or conformity to one reference path. This graph view highlights redundancy, inefficiency, and critical decision points across agents and benchmark runs.
Related papers
- Same Outcomes, Different Journeys: A Trace-Level Framework for Comparing Human and GUI-Agent Behavior in Production Search SystemsApril 9, 2026 · arXiv
- AI Planning Framework for LLM-Based Web AgentsMarch 13, 2026 · arXiv
- An Illusion of Progress? Assessing the Current State of Web AgentsApril 2, 2025 · COLM 2025
- GUIDE: Interpretable GUI Agent Evaluation via Hierarchical DiagnosisApril 6, 2026 · arXiv
- CUAAudit: Meta-Evaluation of Vision-Language Models as Auditors of Autonomous Computer-Use AgentsMarch 11, 2026 · HEAL @ CHI 2026 Workshop
- MemGUI-Bench: Benchmarking Memory of Mobile GUI Agents in Dynamic EnvironmentsFebruary 3, 2026 · arXiv