Towards Trustworthy GUI Agents: A Survey
Yucheng Shi , Wenhao Yu , Jingyuan Huang , Wenlin Yao , Wenhu Chen , Ninghao Liu
- 🏛 Institutions
- University of Georgia , Tencent AI Seattle Lab , MSR , University of Waterloo , PolyU
- 📅 Date
- March 30, 2025
- 📑 Publisher
- arXiv
- 💻 Env
- General GUI
- 🔑 Keywords
TLDR
This survey studies trustworthy GUI agents through a workflow-aligned taxonomy that separates trust into Perception Trust, Reasoning Trust, and Interaction Trust. It reviews benign failures, adversarial attacks, defenses, and evaluation practices, arguing that task completion alone is insufficient for trust assessment.
Related papers (24)
- A Survey of WebAgents: Towards Next-Generation AI Agents for Web Automation with Large Foundation ModelsMarch 30, 2025 · KDD 2025
- A Survey on the Safety and Security Threats of Computer-Using Agents: JARVIS or Ultron?May 16, 2025 · arXiv
- A Survey on GUI Agents with Foundation Models Enhanced by Reinforcement LearningApril 29, 2025 · arXiv
- OS Agents: A Survey on MLLM-based Agents for Computer, Phone and Browser UseDecember 20, 2024 · ACL 2025
- GUI Agents: A SurveyDecember 18, 2024 · Findings of ACL 2025
- GUI Agents with Foundation Models: A Comprehensive SurveyNovember 7, 2024 · arXiv
- LLM-Powered GUI Agents in Phone Automation: Surveying Progress and ProspectsApril 28, 2025 · TMLR 2025
- Generalist Virtual Agents: A Survey on Autonomous Agents Across Digital PlatformsNovember 17, 2024 · arXiv
- ST-WebAgentBench: A Benchmark for Evaluating Safety and Trustworthiness in Web AgentsOctober 9, 2024 · ICLR 2026 (Poster)
- Naive Visual Memory is Not Enough: A Failure-Mode Study of GUI AgentsJune 12, 2026 · arXiv
- Demo2Tutorial: From Human Experience to Multimodal Software TutorialsJune 2, 2026 · arXiv
- STaR-KV: Spatio-Temporal Adaptive Re-weighting for KV Cache Compression in GUI Vision-Language ModelsJune 1, 2026 · arXiv
- GUI-C²: Coarse-to-Fine GUI Grounding via Difficulty-Aware Reinforcement LearningMay 29, 2026 · arXiv
- MementoGUI: Learning Agentic Multimodal Memory Control for Long-Horizon GUI AgentsMay 18, 2026 · arXiv
- Video2GUI: Synthesizing Large-Scale Interaction Trajectories for Generalized GUI Agent PretrainingMay 14, 2026 · arXiv
- Executable Agentic Memory for GUI AgentMay 12, 2026 · arXiv
- LiteGUI: Distilling Compact GUI Agents with Reinforcement LearningMay 8, 2026 · arXiv
- Step-level Optimization for Efficient Computer-use AgentsApril 29, 2026 · arXiv
- Training Computer Use Agents to Assess the Usability of Graphical User InterfacesApril 28, 2026 · arXiv
- AutoGUI-v2: A Comprehensive Multi-Modal GUI Functionality Understanding BenchmarkApril 27, 2026 · arXiv
- Human-Guided Harm Recovery for Computer Use AgentsApril 20, 2026 · arXiv
- UI-Zoomer: Uncertainty-Driven Adaptive Zoom-In for GUI GroundingApril 15, 2026 · arXiv
- GUI-Perturbed: Domain Randomization Reveals Systematic Brittleness in GUI Grounding ModelsApril 15, 2026 · arXiv
- See, Point, Refine: Multi-Turn Approach to GUI Grounding with Visual FeedbackApril 14, 2026 · arXiv