A Survey on GUI Agents with Foundation Models Enhanced by Reinforcement Learning

🏛 Institutions: Lenovo Research
📅 Date: April 29, 2025
📑 Publisher: arXiv
💻 Env: General GUI
🔑 Keywords: survey reinforcement learning MDP formulation training taxonomy perception-planning-acting

TLDR

This survey reviews GUI agents through a reinforcement-learning lens by formalizing GUI interaction as an MDP and organizing prior work around perception, planning, and acting modules. Its main contribution is a training-oriented taxonomy connecting prompt-based methods, supervised fine-tuning, and RL-style policy learning for GUI agents.

Open paper arXiv Report issue