GUI Agents Papers
Star · 821

UltraCUA: A Foundation Model for Computer Use Agents with Hybrid Action

Yuhao Yang , Zhen Yang , Zi-Yi Dou , Anh Nguyen , Keen You , Omar Attia , Andrew Szot , Michael Feng , Ram Ramrakhya , Alexander Toshev , Chao Huang , Yinfei Yang , Zhe Gan

🏛 Institutions
Apple , HKU
📅 Date
October 20, 2025
📑 Publisher
arXiv
💻 Env
Desktop Web
🔑 Keywords
TLDR

UltraCUA bridges low-level GUI actions and higher-level tool use in one computer-use model instead of forcing every task through clicks, typing, and scrolling alone. Its pipeline combines automated tool extraction, synthetic verifiable tasks, supervised fine-tuning, and online RL, and the resulting hybrid-action models improve both OSWorld performance and transfer to WindowsAgentArena.

Open paper arXiv Report issue
Related papers (24)