OS Agents: A Survey on MLLM-based Agents for Computer, Phone and Browser Use

Xueyu Hu , Tao Xiong , Biao Yi , Zishu Wei , Ruixuan Xiao , Yurun Chen , Jiasheng Ye , Meiling Tao , Xiangxin Zhou , Ziyu Zhao , Yuhuai Li , Shengze Xu , Shenzhi Wang , Xinchen Xu , Shuofei Qiao , Zhaokai Wang , Kun Kuang , Tieyong Zeng , Liang Wang , Jiwei Li , Yuchen Eleanor Jiang , Wangchunshu Zhou , Guoyin Wang , Keting Yin , Zhou Zhao , Hongxia Yang , Fan Wu , Shengyu Zhang , Fei Wu

🏛 Institutions: ZJU , Fudan , OPPO AI Center , University of Chinese Academy of Sciences , Institute of Automation , CAS , CUHK , Tsinghua , SJTU , 01.AI , PolyU
📅 Date: December 20, 2024
📑 Publisher: ACL 2025
💻 Env: General GUI
🔑 Keywords: survey architectures benchmarks training safety

TLDR

This survey reviews MLLM-based OS agents across computers, phones, and browsers, covering their environments, observation and action spaces, capabilities, and system designs. It also organizes the benchmark landscape and highlights open problems such as safety, privacy, personalization, and self-evolution.

Open paper Report issue