CODA: Coordinating the Cerebrum and Cerebellum for a Dual-Brain Computer Use Agent with Decoupled Reinforcement Learning

Zeyi Sun , Yuhang Cao , Jianze Liang , Qiushi Sun , Ziyu Liu , Zhixiong Zhang , Yuhang Zang , Xiaoyi Dong , Kai Chen , Dahua Lin , Jiaqi Wang

🏛 Institutions: SJTU , Shanghai AI Laboratory , CUHK , HKU
📅 Date: August 27, 2025
📑 Publisher: arXiv
💻 Env: Desktop
🔑 Keywords: dual-brain architecture decoupled GRPO planner-executor coordination ScienceBoard specialization-to-generalization CODA

TLDR

CODA is a trainable planner-executor composition for specialized computer-use tasks, where a generalist planner is paired with a specialist executor and improved through a two-stage specialization-then-generalization pipeline. On ScienceBoard's scientific software tasks, it uses decoupled GRPO to train application-specific planners and then consolidates successful trajectories into a stronger cross-domain planner.

Open paper arXiv Report issue