GUI Agents Papers
Star · 751

AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?

Ori Yoran, Samuel Joseph Amouyal, Chaitanya Malaviya, Ben Bogin, Ofir Press, Jonathan Berant

🏛 Institutions
Tel Aviv University, University of Pennsylvania, Allen Institute for AI, University of Washington, Princeton
📅 Date
October 21, 2024
📑 Publisher
EMNLP 2024 (Poster)
💻 Env
Web
🔑 Keywords
TLDR

Introduces AssistantBench, a benchmark of 214 realistic and time-consuming web tasks that require sustained planning, retrieval, and synthesis rather than short web interactions. The paper also proposes the SPA agent and shows that even strong models still struggle on these open-web tasks.

Open paper Edit on GitHub Report issue
Related papers