CUA-Suite: Massive Human-annotated Video Demonstrations for Computer-Use Agents

Xiangru Jian , Shravan Nayak , Kevin Qinghong Lin , Aarash Feizi , Kaixin Li , Patrice Bechard , Spandana Gella , Sai Rajeswar

🏛 Institutions: ServiceNow , University of Waterloo , Mila , Université de Montréal , McGill University , Oxford , NUS
📅 Date: March 25, 2026
📑 Publisher: arXiv
💻 Env: Desktop
🔑 Keywords: dataset video demonstrations desktop workflows grounding dataset VideoCUA GroundCUA UI-Vision CUA-Suite

TLDR

CUA-Suite is a large-scale desktop-agent data ecosystem centered on continuous expert video rather than sparse screenshots. It combines VideoCUA, UI-Vision, and GroundCUA to provide 55 hours of demonstrations, dense grounding annotations, and evaluation data across 87 professional desktop applications where current foundation action models still fail frequently.

Open paper arXiv Report issue