OpenCUA: Open Foundations for Computer-Use Agents
Xinyuan Wang, Bowen Wang, Dunjie Lu, Junlin Yang, Tianbao Xie, Junli Wang, Jiaqi Deng, Xiaole Guo, Yiheng Xu, Chen Henry Wu, Zhennan Shen, Zhuokai Li, Ryan Li, Xiaochuan Li, Junda Chen, Boyuan Zheng, Peihang Li, Fangyu Lei, Ruisheng Cao, Yeqiao Fu, Dongchan Shin, Martin Shin, Jiarui Hu, Yuyan Wang, Jixuan Chen, Yuxiao Ye, Danyang Zhang, Dikang Du, Hao Hu, Huarong Chen, Zaida Zhou, Haotian Yao, Ziwei Chen, Qizheng Gu, Yipu Wang, Heng Wang, Diyi Yang, Victor Zhong, Flood Sung, Y.Charles, Zhilin Yang, Tao Yu
- 🏛 Institutions
- XLANG Lab, HKU, Moonshot AI, Stanford, University of Waterloo, CMU
- 📅 Date
- August 12, 2025
- 📑 Publisher
- NeurIPS 2025 (Spotlight)
- 💻 Env
- Desktop Web
- 🔑 Keywords
OpenCUA is an open-source computer-use stack centered on AgentNet Tool for demonstration capture, AgentNet for large-scale cross-platform trajectories, and a training pipeline that adds reflective long chain-of-thought supervision. The paper reports strong open-model results on OSWorld-Verified and argues that cross-platform data and test-time reasoning both materially improve agent performance.
- SpiritSight Agent: Advanced GUI Agent with One LookMarch 5, 2025 · CVPR 2025 (Poster)
- OS-ATLAS: A Foundation Action Model for Generalist GUI AgentsOctober 30, 2024 · ICLR 2025 (Spotlight)
- MolmoWeb: Open Visual Web Agent and Open Data for the Open WebApril 9, 2026 · arXiv
- ShowUI-π: Flow-based Generative Models as GUI Dexterous HandsDecember 31, 2025 · arXiv
- Web-Shepherd: Advancing PRMs for Reinforcing Web AgentsMay 21, 2025 · NeurIPS 2025 (Spotlight)
- Efficient Agent Training for Computer UseMay 20, 2025 · ICLR 2026 (Poster)