GUI Agents Papers
Star · 751

VisualWebArena: Evaluating Multimodal Agents on Realistic Visual Web Tasks

Jing Yu Koh, Robert Lo, Lawrence Jang, Vikram Duvvur, Ming Lim, Po-Yu Huang, Graham Neubig, Shuyan Zhou, Ruslan Salakhutdinov, Daniel Fried

🏛 Institutions
CMU
📅 Date
January 24, 2024
📑 Publisher
ACL 2024
💻 Env
Web
🔑 Keywords
TLDR

VisualWebArena is a benchmark of 910 visually grounded web tasks across Classifieds, Shopping, and Reddit environments. Built on WebArena's self-hosted setup, it targets multimodal web agents that must use image-text inputs rather than text alone.

Open paper Edit on GitHub Report issue
Related papers