A Survey on GUI Agents with Foundation Models Enhanced by Reinforcement Learning
Jiahao Li, Kaer Huang
- 🏛 Institutions
- Lenovo Research
- 📅 Date
- April 29, 2025
- 📑 Publisher
- arXiv
- 💻 Env
- General GUI
- 🔑 Keywords
TLDR
This survey reviews GUI agents through a reinforcement-learning lens by formalizing GUI interaction as an MDP and organizing prior work around perception, planning, and acting modules. Its main contribution is a training-oriented taxonomy connecting prompt-based methods, supervised fine-tuning, and RL-style policy learning for GUI agents.
Related papers
- LLM-Powered GUI Agents in Phone Automation: Surveying Progress and ProspectsApril 28, 2025 · TMLR 2025
- OS-Themis: A Scalable Critic Framework for Generalist GUI RewardsMarch 19, 2026 · arXiv
- CGL: Advancing Continual GUI Learning via Reinforcement Fine-TuningMarch 3, 2026 · arXiv
- GUI-Libra: Training Native GUI Agents to Reason and Act with Action-aware Supervision and Partially Verifiable RLFebruary 25, 2026 · arXiv
- Building Autonomous GUI Navigation via Agentic-Q Estimation and Step-Wise Policy OptimizationFebruary 14, 2026 · arXiv
- Autonomous Continual Learning of Computer-Use Agents for Environment AdaptationFebruary 10, 2026 · arXiv