Cradle: Empowering Foundation Agents Towards General Computer Control

Weihao Tan , Wentao Zhang , Xinrun Xu , Haochong Xia , Ziluo Ding , Boyu Li , Bohan Zhou , Junpeng Yue , Jiechuan Jiang , Yewen Li , Ruyi An , Molei Qin , Chuqiao Zong , Longtao Zheng , Yujie Wu , Xiaoqiang Chai , Yifei Bi , Tianbao Xie , Pengjie Gu , Xiyun Li , Ceyao Zhang , Long Tian , Chaojie Wang , Xinrun Wang , Börje F. Karlsson , Bo An , Shuicheng Yan , Zongqing Lu

🏛 Institutions: Skywork AI , Beijing Academy of Artificial Intelligence , Nanyang Technological University , Peking University , Institute of Software , Chinese Academy of Sciences , The University of Hong Kong , The Chinese University of Hong Kong , Shenzhen
📅 Date: March 5, 2024
📑 Publisher: arXiv
💻 Env
🔑 Keywords: framework Cradle general computer control screen-only control memory self-reflection

TLDR

Cradle formulates general computer control as screenshot input plus keyboard-and-mouse output, and instantiates that setting with a modular multimodal agent for software and video games. It matters to GUI work because it demonstrates screen-only control on real software and evaluates on OSWorld, but the paper is framed as a broader general-computer-control agenda rather than a direct GUI paper.

Open paper arXiv Report issue