Cradle: Empowering Foundation Agents Towards General Computer Control
Weihao Tan , Wentao Zhang , Xinrun Xu , Haochong Xia , Ziluo Ding , Boyu Li , Bohan Zhou , Junpeng Yue , Jiechuan Jiang , Yewen Li , Ruyi An , Molei Qin , Chuqiao Zong , Longtao Zheng , Yujie Wu , Xiaoqiang Chai , Yifei Bi , Tianbao Xie , Pengjie Gu , Xiyun Li , Ceyao Zhang , Long Tian , Chaojie Wang , Xinrun Wang , Börje F. Karlsson , Bo An , Shuicheng Yan , Zongqing Lu
- 🏛 Institutions
- Skywork AI , Beijing Academy of Artificial Intelligence , Nanyang Technological University , Peking University , Institute of Software , Chinese Academy of Sciences , The University of Hong Kong , The Chinese University of Hong Kong , Shenzhen
- 📅 Date
- March 5, 2024
- 📑 Publisher
- arXiv
- 💻 Env
- 🔑 Keywords
Cradle formulates general computer control as screenshot input plus keyboard-and-mouse output, and instantiates that setting with a modular multimodal agent for software and video games. It matters to GUI work because it demonstrates screen-only control on real software and evaluates on OSWorld, but the paper is framed as a broader general-computer-control agenda rather than a direct GUI paper.
- WebATLAS: An LLM Agent with Experience-Driven Memory and Action SimulationOctober 26, 2025 · NeurIPS 2025 Workshop on Language Agents and World Models
- MobA: Multifaceted Memory-Enhanced Adaptive Planning for Efficient Mobile Task AutomationApril 30, 2025 · NAACL 2025 (System Demonstrations)
- MobileGym: A Verifiable and Highly Parallel Simulation Platform for Mobile GUI Agent ResearchMay 25, 2026 · arXiv
- SE-GA: Memory-Augmented Self-Evolution for GUI AgentsMay 16, 2026 · arXiv
- Executable Agentic Memory for GUI AgentMay 12, 2026 · arXiv
- VLAA-GUI: Knowing When to Stop, Recover, and Search, A Modular Framework for GUI AutomationApril 23, 2026 · arXiv
- ClawGUI: A Unified Framework for Training, Evaluating, and Deploying GUI AgentsApril 13, 2026 · arXiv
- EE-MCP: Self-Evolving MCP-GUI Agents via Automated Environment Generation and Experience LearningApril 10, 2026 · arXiv
- Hybrid Self-evolving Structured Memory for GUI AgentsMarch 11, 2026 · arXiv
- Enhancing Web Agents with a Hierarchical Memory TreeMarch 7, 2026 · arXiv
- OpeFlo: Automated UX Evaluation via Simulated Human Web Interaction with GUI GroundingFebruary 25, 2026 · arXiv
- Mobile-Agent-v3.5: Multi-platform Fundamental GUI AgentsFebruary 15, 2026 · arXiv
- VenusBench-Mobile: A Challenging and User-Centric Benchmark for Mobile GUI Agents with Capability DiagnosticsFebruary 6, 2026 · arXiv
- UI-Mem: Self-Evolving Experience Memory for Online Reinforcement Learning in Mobile GUI AgentsFebruary 5, 2026 · arXiv