CUA-Suite: Massive Human-annotated Video Demonstrations for Computer-Use Agents
Xiangru Jian, Shravan Nayak, Kevin Qinghong Lin, Aarash Feizi, Kaixin Li, Patrice Bechard, Spandana Gella, Sai Rajeswar
- 🏛 Institutions
- ServiceNow, University of Waterloo, Mila, Université de Montréal, McGill University, Oxford, NUS
- 📅 Date
- March 25, 2026
- 📑 Publisher
- arXiv
- 💻 Env
- Desktop
- 🔑 Keywords
TLDR
CUA-Suite is a large-scale desktop-agent data ecosystem centered on continuous expert video rather than sparse screenshots. It combines VideoCUA, UI-Vision, and GroundCUA to provide 55 hours of demonstrations, dense grounding annotations, and evaluation data across 87 professional desktop applications where current foundation action models still fail frequently.
Related papers
- Watch and Learn: Learning to Use Computers from Online VideosOctober 6, 2025 · CVPR 2026
- UI-Vision: A Desktop-centric GUI Benchmark for Visual Perception and InteractionMarch 19, 2025 · ICML 2025 (Poster)
- Gym-Anything: Turn any Software into an Agent EnvironmentApril 7, 2026 · arXiv
- Video-Based Reward Modeling for Computer-Use AgentsMarch 10, 2026 · arXiv
- When Actions Go Off-Task: Detecting and Correcting Misaligned Actions in Computer-Use AgentsFebruary 9, 2026 · arXiv
- ShowUI-π: Flow-based Generative Models as GUI Dexterous HandsDecember 31, 2025 · arXiv