GUI Agents Papers
Star · 751

Scaling Agents for Computer Use

Gonzalo Gonzalez-Pumariega, Vincent Tu, Chih-Lun Lee, Jiachen Yang, Ang Li, Xin Eric Wang

🏛 Institutions
Simular Research
📅 Date
October 2, 2025
📑 Publisher
arXiv
💻 Env
Desktop
🔑 Keywords
TLDR

This paper argues that computer-use agents scale more effectively across multiple rollouts than within a single rollout, and introduces Behavior Judge (BJudge) to compare candidate trajectories via compact behavior narratives. BJudge reaches 72.6% on OSWorld, slightly surpassing reported human performance, and also generalizes to WindowsAgentArena and AndroidWorld.

Open paper arXiv Edit on GitHub Report issue
Related papers