CoAct-1: Computer-using Multi-Agent System with Coding Actions
Linxin Song, Yutong Dai, Viraj Prabhu, Jieyu Zhang, Taiwei Shi, Li Li, Junnan Li, Silvio Savarese, Zeyuan Chen, Jieyu Zhao, Ran Xu, Caiming Xiong
- 🏛 Institutions
- USC, Salesforce AI Research, University of Washington
- 📅 Date
- August 5, 2025
- 📑 Publisher
- ICLR 2026 (Poster)
- 💻 Env
- Desktop
- 🔑 Keywords
TLDR
CoAct-1 augments desktop GUI control with direct Python and Bash execution by letting an orchestrator assign subtasks to either a GUI operator or a programmer agent. On OSWorld and WindowsAgentArena, this hybrid setup reduces brittle GUI-only action chains and improves both success rate and step efficiency.
Related papers
- Watch and Learn: Learning to Use Computers from Online VideosOctober 6, 2025 · CVPR 2026
- Agent S2: A Compositional Generalist-Specialist Framework for Computer Use AgentsApril 1, 2025 · COLM 2025
- IntentScore: Intent-Conditioned Action Evaluation for Computer-Use AgentsApril 6, 2026 · arXiv
- GUIDE: Resolving Domain Bias in GUI Agents through Real-Time Web Video Retrieval and Plug-and-Play AnnotationMarch 27, 2026 · arXiv
- EvoCUA: Evolving Computer Use Agents via Learning from Scalable Synthetic ExperienceJanuary 22, 2026 · arXiv
- CaMeLs Can Use Computers Too: System-level Security for Computer Use AgentsJanuary 14, 2026 · arXiv