GUI Agents Papers
Star · 751

An Illusion of Progress? Assessing the Current State of Web Agents

Tianci Xue, Weijian Qi, Tianneng Shi, Chan Hee Song, Boyu Gou, Dawn Song, Huan Sun, Yu Su

🏛 Institutions
OSU, UC Berkeley
📅 Date
April 2, 2025
📑 Publisher
COLM 2025
💻 Env
Web
🔑 Keywords
TLDR

This paper argues that reported web-agent progress is overstated once agents are evaluated on more realistic online tasks. It introduces Online-Mind2Web with 300 tasks across 136 live websites, pairs it with the WebJudge automatic evaluation method, and uses that setup to show a much weaker picture of current web-agent capability than prior benchmarks suggest.

Open paper Edit on GitHub Report issue
Related papers