GUI Agents: A Survey
Dang Nguyen, Jian Chen, Yu Wang, Gang Wu, Namyong Park, Zhengmian Hu, Hanjia Lyu, Junda Wu, Ryan Aponte, Yu Xia, Xintong Li, Jing Shi, Hongjie Chen, Viet Dac Lai, Zhouhang Xie, Sungchul Kim, Ruiyi Zhang, Tong Yu, Mehrab Tanjim, Nesreen K. Ahmed, Puneet Mathur, Seunghyun Yoon, Lina Yao, Branislav Kveton, Jihyung Kil, Thien Huu Nguyen, Trung Bui, Tianyi Zhou, Ryan A. Rossi, Franck Dernoncourt
- 🏛 Institutions
- UMD, State University of New York at Buffalo, University of Oregon, Adobe Research, University of Rochester, UC San Diego, CMU, Dolby Labs, Cisco Research, University of New South Wales
- 📅 Date
- December 18, 2024
- 📑 Publisher
- Findings of ACL 2025
- 💻 Env
- General GUI
- 🔑 Keywords
TLDR
This survey organizes GUI-agent research around benchmarks, evaluation metrics, architectures, and training methods for agents powered by large foundation models. It proposes a unified perception-reasoning-planning-acting framework and highlights the open problems that remain across the stack.
Related papers
- OS Agents: A Survey on MLLM-based Agents for Computer, Phone and Browser UseDecember 20, 2024 · ACL 2025
- A Survey of WebAgents: Towards Next-Generation AI Agents for Web Automation with Large Foundation ModelsMarch 30, 2025 · KDD 2025
- GUIDE: Interpretable GUI Agent Evaluation via Hierarchical DiagnosisApril 6, 2026 · arXiv
- Where Not to Learn: Prior-Aligned Training with Subset-based Attribution Constraints for Reliable Decision-MakingJanuary 30, 2026 · arXiv
- A Survey on the Safety and Security Threats of Computer-Using Agents: JARVIS or Ultron?May 16, 2025 · arXiv
- A Survey on GUI Agents with Foundation Models Enhanced by Reinforcement LearningApril 29, 2025 · arXiv