GUI Agents Papers
Star · 821

OS-Marathon: Benchmarking Computer-Use Agents on Long-Horizon Repetitive Tasks

Jing Wu , Daphne Barretto , Yiye Chen , Nicholas Gydé , Yanan Jian , Yuhang He , Vibhav Vineet

🏛 Institutions
Oxford , Microsoft , Georgia Tech
📅 Date
January 28, 2026
📑 Publisher
arXiv
💻 Env
Desktop
🔑 Keywords
TLDR

OS-Marathon benchmarks computer-use agents on 242 long-horizon repetitive desktop workflows such as expense processing and grade entry. The paper also introduces a few-shot condensed-demonstration method for teaching the recurring sub-workflow logic behind these tasks.

Open paper arXiv Report issue
Related papers (24)