GUI Agents with Foundation Models: A Comprehensive Survey
Shuai Wang, Weiwen Liu, Jingxuan Chen, Yuqi Zhou, Weinan Gan, Xingshan Zeng, Yuhan Che, Shuai Yu, Xinlong Hao, Kun Shao, Bin Wang, Chuhan Wu, Yasheng Wang, Ruiming Tang, Jianye Hao
- 🏛 Institutions
- Huawei Noah's Ark Lab
- 📅 Date
- November 7, 2024
- 📑 Publisher
- arXiv
- 💻 Env
- General GUI
- 🔑 Keywords
TLDR
This survey organizes foundation-model GUI agents around data resources, agent construction, taxonomy, and industrial applications. It also summarizes open challenges around the benchmark-reality gap, agent self-evolution, and inference efficiency.
Related papers
- Generalist Virtual Agents: A Survey on Autonomous Agents Across Digital PlatformsNovember 17, 2024 · arXiv
- How Smart Is Your GUI Agent? A Framework for the Future of Software InteractionFebruary 12, 2026 · arXiv
- A Survey on the Safety and Security Threats of Computer-Using Agents: JARVIS or Ultron?May 16, 2025 · arXiv
- A Survey on GUI Agents with Foundation Models Enhanced by Reinforcement LearningApril 29, 2025 · arXiv
- Towards Trustworthy GUI Agents: A SurveyMarch 30, 2025 · arXiv
- OS Agents: A Survey on MLLM-based Agents for Computer, Phone and Browser UseDecember 20, 2024 · ACL 2025