LLM-Powered GUI Agents in Phone Automation: Surveying Progress and Prospects
Guangyi Liu , Pengxiang Zhao , Yaozhen Liang , Liang Liu , Yaxuan Guo , Han Xiao , Weifeng Lin , Yuxiang Chai , Yue Han , Shuai Ren , Hao Wang , Xiaoyu Liang , WenHao Wang , Tianze Wu , Zhengxi Lu , Siheng Chen , LiLinghao , Guanjing Xiong , Yong Liu , Hongsheng Li
- 🏛 Institutions
- ZJU , vivo AI Lab , CUHK MMLab , SJTU
- 📅 Date
- April 28, 2025
- 📑 Publisher
- TMLR 2025
- 💻 Env
- Mobile
- 🔑 Keywords
TLDR
This survey reviews the development of LLM-powered mobile GUI agents for phone automation, from script-like systems to adaptive multimodal agents. It organizes the space around agent architectures, training approaches, datasets and benchmarks, and closes with open problems such as user adaptation, on-device efficiency, and security.
Related papers (24)
- A Survey on the Safety and Security Threats of Computer-Using Agents: JARVIS or Ultron?May 16, 2025 · arXiv
- A Survey on GUI Agents with Foundation Models Enhanced by Reinforcement LearningApril 29, 2025 · arXiv
- AgentRAE: Remote Action Execution through Notification-based Visual Backdoors against Screenshots-based Mobile GUI AgentsMarch 24, 2026 · arXiv
- Zero-Permission Manipulation: Can We Trust Large Multimodal Model Powered GUI Agents?January 18, 2026 · arXiv
- MobileWorldBench: Towards Semantic World Modeling For Mobile AgentsDecember 16, 2025 · arXiv
- Building a Stable Planner: An Extended Finite State Machine Based Planning Module for Mobile GUI AgentMay 20, 2025 · arXiv
- Evaluating the Robustness of Multimodal Agents Against Active Environmental Injection AttacksFebruary 18, 2025 · ACM MM 2025
- Generalist Virtual Agents: A Survey on Autonomous Agents Across Digital PlatformsNovember 17, 2024 · arXiv
- Demo2Tutorial: From Human Experience to Multimodal Software TutorialsJune 2, 2026 · arXiv
- Executable Agentic Memory for GUI AgentMay 12, 2026 · arXiv
- The Blind Spot of Agent Safety: How Benign User Instructions Expose Critical Vulnerabilities in Computer-Use AgentsApril 12, 2026 · arXiv
- Preference Redirection via Attention Concentration: An Attack on Computer Use AgentsApril 9, 2026 · arXiv
- WebSP-Eval: Evaluating Web Agents on Website Security and Privacy TasksApril 7, 2026 · arXiv
- Visual Confused Deputy: Exploiting and Defending Perception Failures in Computer-Using AgentsMarch 16, 2026 · arXiv
- SlowBA: An efficiency backdoor attack towards VLM-based GUI agentsMarch 9, 2026 · arXiv
- Dual-Modality Multi-Stage Adversarial Safety Training: Robustifying Multimodal Web Agents Against Cross-Modal AttacksMarch 4, 2026 · arXiv
- TreeCUA: Efficiently Scaling GUI Automation with Tree-Structured Verifiable EvolutionFebruary 10, 2026 · arXiv
- WebSentinel: Detecting and Localizing Prompt Injection Attacks for Web AgentsFebruary 3, 2026 · arXiv
- SSL: Sweet Spot Learning for Differentiated Guidance in Agentic OptimizationJanuary 30, 2026 · arXiv
- WebATLAS: An LLM Agent with Experience-Driven Memory and Action SimulationOctober 26, 2025 · NeurIPS 2025 Workshop on Language Agents and World Models
- HackWorld: Evaluating Computer-Use Agents on Exploiting Web Application VulnerabilitiesOctober 14, 2025 · ICLR 2026 (Poster)
- Environmental Injection Attacks against GUI Agents in Realistic Dynamic EnvironmentsSeptember 14, 2025 · arXiv
- AgentSentinel: An End-to-End and Real-Time Security Defense Framework for Computer-Use AgentsSeptember 9, 2025 · CCS 2025
- VPI-Bench: Visual Prompt Injection Attacks for Computer-Use AgentsJune 3, 2025 · ICLR 2026 (Poster)