GPA: Learning GUI Process Automation from Demonstrations
Zirui Zhao , Jun Hao Liew , Yan Yang , Wenzhuo Yang , Ziyang Luo , Doyen Sahoo , Silvio Savarese , Junnan Li
- 🏛 Institutions
- Salesforce AI Research
- 📅 Date
- April 2, 2026
- 📑 Publisher
- arXiv
- 💻 Env
- Desktop
- 🔑 Keywords
TLDR
GPA is a vision-based GUI process automation system that enables fast and stable process replay from a single demonstration. Using Sequential Monte Carlo-based localization and readiness calibration, it achieves higher success rates with 10x faster execution than Gemini 3 Pro on long-horizon GUI tasks, running entirely locally without cloud LLMs.
Related papers (24)
- M^2: Dual-Memory Augmentation for Long-Horizon Web Agents via Trajectory Summarization and Insight RetrievalFebruary 28, 2026 · arXiv
- Workflow-GYM: Towards Long-Horizon Evaluation of Computer-use Agentic tasks in Real-World Professional FieldsJune 9, 2026 · arXiv
- HealthAdminBench: Evaluating Computer-Use Agents on Healthcare Administration TasksApril 10, 2026 · arXiv
- Gym-Anything: Turn any Software into an Agent EnvironmentApril 7, 2026 · arXiv
- GUIDE: Resolving Domain Bias in GUI Agents through Real-Time Web Video Retrieval and Plug-and-Play AnnotationMarch 27, 2026 · arXiv
- OS-Marathon: Benchmarking Computer-Use Agents on Long-Horizon Repetitive TasksJanuary 28, 2026 · arXiv
- Are LLM Agents the New RPA? A Comparative Study with RPA Across Enterprise WorkflowsSeptember 4, 2025 · arXiv
- Improved GUI Grounding via Iterative NarrowingNovember 18, 2024 · arXiv
- STaR-KV: Spatio-Temporal Adaptive Re-weighting for KV Cache Compression in GUI Vision-Language ModelsJune 1, 2026 · arXiv
- SimuWoB: Simulating Real-World Mobile Apps for Fast and Faithful GUI Agent BenchmarkingMay 24, 2026 · arXiv
- MementoGUI: Learning Agentic Multimodal Memory Control for Long-Horizon GUI AgentsMay 18, 2026 · arXiv
- UI-Zoomer: Uncertainty-Driven Adaptive Zoom-In for GUI GroundingApril 15, 2026 · arXiv
- CocoaBench: Evaluating Unified Digital Agents in the WildApril 13, 2026 · arXiv
- ClawBench: Can AI Agents Complete Everyday Online Tasks?April 9, 2026 · arXiv
- AndroTMem: From Interaction Trajectories to Anchored Memory in Long-Horizon GUI AgentsMarch 19, 2026 · arXiv
- Zoom to Essence: Trainless GUI Grounding by Inferring upon Interface ElementsMarch 15, 2026 · arXiv
- Trifuse: Enhancing Attention-Based GUI Grounding via Multimodal FusionFebruary 6, 2026 · arXiv
- Darwinian Memory: A Training-Free Self-Regulating Memory System for GUI Agent EvolutionJanuary 30, 2026 · arXiv
- MobileBench-OL: A Comprehensive Chinese Benchmark for Evaluating Mobile GUI Agents in Real-World EnvironmentJanuary 28, 2026 · arXiv
- LongHorizonUI: A Unified Framework for Robust long-horizon Task Automation of GUI AgentJanuary 26, 2026 · ICLR 2026 (Poster)
- MobileWorld: Benchmarking Autonomous Mobile Agents in Agent-User Interactive and MCP-Augmented EnvironmentsDecember 22, 2025 · arXiv
- MVP: Multiple View Prediction Improves GUI GroundingDecember 9, 2025 · arXiv
- Zoom in, Click out: Unlocking and Evaluating the Potential of Zooming for GUI GroundingDecember 5, 2025 · arXiv
- Training High-Level Schedulers with Execution-Feedback Reinforcement Learning for Long-Horizon GUI AutomationNovember 27, 2025 · CVPR 2026