GUI Agents Papers
Star · 751

Cradle: Empowering Foundation Agents Towards General Computer Control

Weihao Tan, Wentao Zhang, Xinrun Xu, Haochong Xia, Ziluo Ding, Boyu Li, Bohan Zhou, Junpeng Yue, Jiechuan Jiang, Yewen Li, Ruyi An, Molei Qin, Chuqiao Zong, Longtao Zheng, Yujie Wu, Xiaoqiang Chai, Yifei Bi, Tianbao Xie, Pengjie Gu, Xiyun Li, Ceyao Zhang, Long Tian, Chaojie Wang, Xinrun Wang, Börje F. Karlsson, Bo An, Shuicheng Yan, Zongqing Lu

🏛 Institutions
Skywork AI, Beijing Academy of Artificial Intelligence, Nanyang Technological University, Peking University, Institute of Software, Chinese Academy of Sciences, The University of Hong Kong, The Chinese University of Hong Kong, Shenzhen
📅 Date
March 5, 2024
📑 Publisher
arXiv
💻 Env
🔑 Keywords
TLDR

Cradle formulates general computer control as screenshot input plus keyboard-and-mouse output, and instantiates that setting with a modular multimodal agent for software and video games. It matters to GUI work because it demonstrates screen-only control on real software and evaluates on OSWorld, but the paper is framed as a broader general-computer-control agenda rather than a direct GUI paper.

Open paper arXiv Edit on GitHub Report issue
Related papers