LLM-Powered GUI Agents in Phone Automation: Surveying Progress and Prospects
Guangyi Liu, Pengxiang Zhao, Yaozhen Liang, Liang Liu, Yaxuan Guo, Han Xiao, Weifeng Lin, Yuxiang Chai, Yue Han, Shuai Ren, Hao Wang, Xiaoyu Liang, WenHao Wang, Tianze Wu, Zhengxi Lu, Siheng Chen, LiLinghao, Guanjing Xiong, Yong Liu, Hongsheng Li
- 🏛 Institutions
- ZJU, vivo AI Lab, CUHK MMLab, SJTU
- 📅 Date
- April 28, 2025
- 📑 Publisher
- TMLR 2025
- 💻 Env
- Mobile
- 🔑 Keywords
TLDR
This survey reviews the development of LLM-powered mobile GUI agents for phone automation, from script-like systems to adaptive multimodal agents. It organizes the space around agent architectures, training approaches, datasets and benchmarks, and closes with open problems such as user adaptation, on-device efficiency, and security.
Related papers
- A Survey on the Safety and Security Threats of Computer-Using Agents: JARVIS or Ultron?May 16, 2025 · arXiv
- A Survey on GUI Agents with Foundation Models Enhanced by Reinforcement LearningApril 29, 2025 · arXiv
- AgentRAE: Remote Action Execution through Notification-based Visual Backdoors against Screenshots-based Mobile GUI AgentsMarch 24, 2026 · arXiv
- Zero-Permission Manipulation: Can We Trust Large Multimodal Model Powered GUI Agents?January 18, 2026 · arXiv
- MobileWorldBench: Towards Semantic World Modeling For Mobile AgentsDecember 16, 2025 · arXiv
- Building a Stable Planner: An Extended Finite State Machine Based Planning Module for Mobile GUI AgentMay 20, 2025 · arXiv