CoAct-1: Computer-using Multi-Agent System with Coding Actions

Linxin Song , Yutong Dai , Viraj Prabhu , Jieyu Zhang , Taiwei Shi , Li Li , Junnan Li , Silvio Savarese , Zeyuan Chen , Jieyu Zhao , Ran Xu , Caiming Xiong

🏛 Institutions: USC , Salesforce AI Research , University of Washington
📅 Date: August 5, 2025
📑 Publisher: ICLR 2026 (Poster)
💻 Env: Desktop
🔑 Keywords: coding actions programmer agent orchestrator OSWorld WindowsAgentArena CoAct-1

TLDR

CoAct-1 augments desktop GUI control with direct Python and Bash execution by letting an orchestrator assign subtasks to either a GUI operator or a programmer agent. On OSWorld and WindowsAgentArena, this hybrid setup reduces brittle GUI-only action chains and improves both success rate and step efficiency.

Open paper arXiv Report issue