OS Agents: A Survey on MLLM-based Agents for Computer, Phone and Browser Use
Xueyu Hu , Tao Xiong , Biao Yi , Zishu Wei , Ruixuan Xiao , Yurun Chen , Jiasheng Ye , Meiling Tao , Xiangxin Zhou , Ziyu Zhao , Yuhuai Li , Shengze Xu , Shenzhi Wang , Xinchen Xu , Shuofei Qiao , Zhaokai Wang , Kun Kuang , Tieyong Zeng , Liang Wang , Jiwei Li , Yuchen Eleanor Jiang , Wangchunshu Zhou , Guoyin Wang , Keting Yin , Zhou Zhao , Hongxia Yang , Fan Wu , Shengyu Zhang , Fei Wu
- 🏛 Institutions
- ZJU , Fudan , OPPO AI Center , University of Chinese Academy of Sciences , Institute of Automation , CAS , CUHK , Tsinghua , SJTU , 01.AI , PolyU
- 📅 Date
- December 20, 2024
- 📑 Publisher
- ACL 2025
- 💻 Env
- General GUI
- 🔑 Keywords
TLDR
This survey reviews MLLM-based OS agents across computers, phones, and browsers, covering their environments, observation and action spaces, capabilities, and system designs. It also organizes the benchmark landscape and highlights open problems such as safety, privacy, personalization, and self-evolution.
Related papers (24)
- GUI Agents: A SurveyDecember 18, 2024 · Findings of ACL 2025
- A Survey of WebAgents: Towards Next-Generation AI Agents for Web Automation with Large Foundation ModelsMarch 30, 2025 · KDD 2025
- A Survey on the Safety and Security Threats of Computer-Using Agents: JARVIS or Ultron?May 16, 2025 · arXiv
- Human-Guided Harm Recovery for Computer Use AgentsApril 20, 2026 · arXiv
- Are GUI Agents Focused Enough? Automated Distraction via Semantic-level UI Element InjectionApril 9, 2026 · arXiv
- LPS-Bench: Benchmarking Safety Awareness of Computer-Use Agents in Long-Horizon Planning under Benign and Adversarial ScenariosFebruary 3, 2026 · arXiv
- SafePred: A Predictive Guardrail for Computer-Using Agents via World ModelsFebruary 2, 2026 · arXiv
- Where Not to Learn: Prior-Aligned Training with Subset-based Attribution Constraints for Reliable Decision-MakingJanuary 30, 2026 · arXiv
- GEM: Gaussian Embedding Modeling for Out-of-Distribution Detection in GUI AgentsMay 19, 2025 · arXiv
- A Survey on GUI Agents with Foundation Models Enhanced by Reinforcement LearningApril 29, 2025 · arXiv
- Towards Trustworthy GUI Agents: A SurveyMarch 30, 2025 · arXiv
- GUI Agents with Foundation Models: A Comprehensive SurveyNovember 7, 2024 · arXiv
- The Blind Spot of Agent Safety: How Benign User Instructions Expose Critical Vulnerabilities in Computer-Use AgentsApril 12, 2026 · arXiv
- CORA: Conformal Risk-Controlled Agents for Safeguarded Mobile GUI AutomationApril 10, 2026 · arXiv
- Preference Redirection via Attention Concentration: An Attack on Computer Use AgentsApril 9, 2026 · arXiv
- Dual-Modality Multi-Stage Adversarial Safety Training: Robustifying Multimodal Web Agents Against Cross-Modal AttacksMarch 4, 2026 · arXiv
- When Benign Inputs Lead to Severe Harms: Eliciting Unsafe Unintended Behaviors of Computer-Use AgentsFebruary 9, 2026 · arXiv
- When Actions Go Off-Task: Detecting and Correcting Misaligned Actions in Computer-Use AgentsFebruary 9, 2026 · arXiv
- CaMeLs Can Use Computers Too: System-level Security for Computer Use AgentsJanuary 14, 2026 · arXiv
- WebTrap Park: An Automated Platform for Systematic Security Evaluation of Web AgentsJanuary 13, 2026 · arXiv
- It's a TRAP! Task-Redirecting Agent Persuasion Benchmark for Web AgentsDecember 29, 2025 · arXiv
- DECEPTICON: How Dark Patterns Manipulate Web AgentsDecember 28, 2025 · arXiv
- Permission Manifests for Web AgentsDecember 7, 2025 · arXiv
- Genesis: Evolving Attack Strategies for LLM Web Agent Red-TeamingOctober 21, 2025 · ICME 2026