GUI Agents Papers
Star · 821

BEARCUBS: A benchmark for computer-using web agents

Yixiao Song , Katherine Thai , Chau Minh Pham , Yapei Chang , Mazin Nadaf , Mohit Iyyer

🏛 Institutions
UMass Amherst , UMD
📅 Date
March 10, 2025
📑 Publisher
COLM 2025
💻 Env
Web
🔑 Keywords
TLDR

BEARCUBS is a benchmark of 111 information-seeking questions that require web agents to operate on live websites instead of static replicas. Its tasks force multimodal interactions such as video understanding and 3D navigation, and each question comes with a short answer and human-validated browsing trajectory for transparent evaluation.

Open paper arXiv Report issue
Related papers (24)