GUI Agents Papers
Star · 751

OS-Marathon: Benchmarking Computer-Use Agents on Long-Horizon Repetitive Tasks

Jing Wu, Daphne Barretto, Yiye Chen, Nicholas Gydé, Yanan Jian, Yuhang He, Vibhav Vineet

🏛 Institutions
Oxford, Microsoft, Georgia Tech
📅 Date
January 28, 2026
📑 Publisher
arXiv
💻 Env
Desktop
🔑 Keywords
TLDR

OS-Marathon benchmarks computer-use agents on 242 long-horizon repetitive desktop workflows such as expense processing and grade entry. The paper also introduces a few-shot condensed-demonstration method for teaching the recurring sub-workflow logic behind these tasks.

Open paper arXiv Edit on GitHub Report issue
Related papers