GUI Agents Papers
Star · 751

VideoAgentTrek: Computer Use Pretraining from Unlabeled Videos

Dunjie Lu, Yiheng Xu, Junli Wang, Haoyuan Wu, Xinyuan Wang, Zekun Wang, Junlin Yang, Hongjin Su, Jixuan Chen, Junda Chen, Yuchen Mao, Jingren Zhou, Junyang Lin, Binyuan Hui, Tao Yu

🏛 Institutions
Google Cloud AI Research, OSU
📅 Date
October 22, 2025
📑 Publisher
arXiv
💻 Env
General GUI
🔑 Keywords
TLDR

VideoAgentTrek studies how to pretrain computer-use agents from passive screen recordings instead of manually labeled trajectories. Its Video2Action pipeline recovers action boundaries and structured parameters from 39,000 tutorial videos, yielding 1.52 million steps that improve both OSWorld-Verified and AgentNetBench after continued pretraining.

Open paper arXiv Edit on GitHub Report issue
Related papers