GUI Agents Papers
Star · 821

An Illusion of Progress? Assessing the Current State of Web Agents

Tianci Xue , Weijian Qi , Tianneng Shi , Chan Hee Song , Boyu Gou , Dawn Song , Huan Sun , Yu Su

🏛 Institutions
OSU , UC Berkeley
📅 Date
April 2, 2025
📑 Publisher
COLM 2025
💻 Env
Web
🔑 Keywords
TLDR

This paper argues that reported web-agent progress is overstated once agents are evaluated on more realistic online tasks. It introduces Online-Mind2Web with 300 tasks across 136 live websites, pairs it with the WebJudge automatic evaluation method, and uses that setup to show a much weaker picture of current web-agent capability than prior benchmarks suggest.

Open paper Report issue
Related papers (24)