CoAct-1: Computer-using Multi-Agent System with Coding Actions
Linxin Song , Yutong Dai , Viraj Prabhu , Jieyu Zhang , Taiwei Shi , Li Li , Junnan Li , Silvio Savarese , Zeyuan Chen , Jieyu Zhao , Ran Xu , Caiming Xiong
- 🏛 Institutions
- USC , Salesforce AI Research , University of Washington
- 📅 Date
- August 5, 2025
- 📑 Publisher
- ICLR 2026 (Poster)
- 💻 Env
- Desktop
- 🔑 Keywords
TLDR
CoAct-1 augments desktop GUI control with direct Python and Bash execution by letting an orchestrator assign subtasks to either a GUI operator or a programmer agent. On OSWorld and WindowsAgentArena, this hybrid setup reduces brittle GUI-only action chains and improves both success rate and step efficiency.
Related papers (24)
- Watch and Learn: Learning to Use Computers from Online VideosOctober 6, 2025 · CVPR 2026
- Agent S2: A Compositional Generalist-Specialist Framework for Computer Use AgentsApril 1, 2025 · COLM 2025
- IntentScore: Intent-Conditioned Action Evaluation for Computer-Use AgentsApril 6, 2026 · arXiv
- GUIDE: Resolving Domain Bias in GUI Agents through Real-Time Web Video Retrieval and Plug-and-Play AnnotationMarch 27, 2026 · arXiv
- EvoCUA: Evolving Computer Use Agents via Learning from Scalable Synthetic ExperienceJanuary 22, 2026 · arXiv
- CaMeLs Can Use Computers Too: System-level Security for Computer Use AgentsJanuary 14, 2026 · arXiv
- Scaling Agents for Computer UseOctober 2, 2025 · arXiv
- Efficient Multi-turn RL for GUI Agents via Decoupled Training and Adaptive Data CurationSeptember 28, 2025 · arXiv
- ComputerRL: Scaling End-to-End Online Reinforcement Learning for Computer Use AgentsAugust 19, 2025 · ICLR 2026 (Poster)
- Evolving in Tasks: Empowering the Multi-modality Large Language Model as the Computer Use AgentAugust 6, 2025 · arXiv
- LiteCUA: Computer as MCP Server for Computer-Use Agent on AIOSMay 24, 2025 · arXiv
- Attacking Vision-Language Computer Agents via Pop-upsNovember 4, 2024 · ACL 2025
- OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer EnvironmentsApril 11, 2024 · NeurIPS 2024 Datasets and Benchmarks Track
- Agent Alpha: Tree Search Unifying Generation, Exploration and Evaluation for Computer-Use AgentsFebruary 3, 2026 · arXiv
- Agentic Reward Modeling: Verifying GUI Agent via Online Proactive InteractionJanuary 31, 2026 · arXiv
- BEAP-Agent: Backtrackable Execution and Adaptive Planning for GUI AgentsJanuary 29, 2026 · arXiv
- From Off-Policy to On-Policy: Enhancing GUI Agents via Bi-level Expert-to-Policy AssimilationJanuary 9, 2026 · arXiv
- R-WoM: Retrieval-augmented World Model For Computer-use AgentsOctober 13, 2025 · ICLR 2026 (Poster)
- Just Do It!? Computer-Use Agents Exhibit Blind Goal-DirectednessOctober 2, 2025 · ICLR 2026 (Poster)
- Workflow-GYM: Towards Long-Horizon Evaluation of Computer-use Agentic tasks in Real-World Professional FieldsJune 9, 2026 · arXiv
- A11y-Compressor: A Framework for Enhancing the Efficiency of GUI Agent Observations through Visual Context Reconstruction and Redundancy ReductionMay 1, 2026 · arXiv
- WindowsWorld: A Process-Centric Benchmark of Autonomous GUI Agents in Professional Cross-Application EnvironmentsApril 30, 2026 · arXiv
- VLAA-GUI: Knowing When to Stop, Recover, and Search, A Modular Framework for GUI AutomationApril 23, 2026 · arXiv
- The Blind Spot of Agent Safety: How Benign User Instructions Expose Critical Vulnerabilities in Computer-Use AgentsApril 12, 2026 · arXiv