UltraCUA: A Foundation Model for Computer Use Agents with Hybrid Action

Yuhao Yang , Zhen Yang , Zi-Yi Dou , Anh Nguyen , Keen You , Omar Attia , Andrew Szot , Michael Feng , Ram Ramrakhya , Alexander Toshev , Chao Huang , Yinfei Yang , Zhe Gan

🏛 Institutions: Apple , HKU
📅 Date: October 20, 2025
📑 Publisher: arXiv
💻 Env: Desktop Web
🔑 Keywords: model hybrid action tool calls synthetic tasks online reinforcement learning UltraCUA

TLDR

UltraCUA bridges low-level GUI actions and higher-level tool use in one computer-use model instead of forcing every task through clicks, typing, and scrolling alone. Its pipeline combines automated tool extraction, synthetic verifiable tasks, supervised fine-tuning, and online RL, and the resulting hybrid-action models improve both OSWorld performance and transfer to WindowsAgentArena.

Open paper arXiv Report issue