GUI Agents Papers
Star · 751

WebSuite: Systematically Evaluating Why Web Agents Fail

Eric Li, Jim Waldo

🏛 Institutions
Harvard
📅 Date
June 1, 2024
📑 Publisher
arXiv
💻 Env
Web
🔑 Keywords
TLDR

Introduces WebSuite, a diagnostic benchmark for understanding why web agents fail rather than only whether they fail. It organizes web behavior into a taxonomy of actions and builds both atomic and end-to-end tasks so failures can be traced back to specific action categories.

Open paper arXiv Edit on GitHub Report issue
Related papers