GUI Agents Papers
Star · 751

Odysseys: Benchmarking Web Agents on Realistic Long Horizon Tasks

Lawrence Keunho Jang, Jing Yu Koh, Daniel Fried, Ruslan Salakhutdinov

🏛 Institutions
CMU
📅 Date
April 27, 2026
📑 Publisher
arXiv
💻 Env
Web
🔑 Keywords
TLDR

Odysseys targets the saturation of short single-site web-agent benchmarks by curating 200 realistic long-horizon multi-site workflows graded with 1,225 rubric items. The benchmark exposes large gaps between frontier computer-use agents and human performance on extended cross-site reasoning and persistent task state.

Open paper arXiv Edit on GitHub Report issue
Related papers