GUI Agents Papers
Star · 821

WebCanvas: Benchmarking Web Agents in Online Environments

Yichen Pan , Dehan Kong , Sida Zhou , Cheng Cui , Yifei Leng , Bing Jiang , Hangyu Liu , Yanyi Shang , Shuyan Zhou , Tongshuang Wu , Zhengyang Wu

🏛 Institutions
iMean AI , CMU
📅 Date
June 18, 2024
📑 Publisher
arXiv
💻 Env
Web
🔑 Keywords
TLDR

WebCanvas is an online web-agent benchmark built to evaluate agents against live websites rather than static snapshots. It introduces key-node evaluation for progress-aware scoring, releases Mind2Web-Live with 542 tasks and 2,439 intermediate evaluation states, and provides tooling to annotate and maintain those tasks as the web changes.

Open paper arXiv Report issue
Related papers (24)