GPA: Learning GUI Process Automation from Demonstrations
Zirui Zhao, Jun Hao Liew, Yan Yang, Wenzhuo Yang, Ziyang Luo, Doyen Sahoo, Silvio Savarese, Junnan Li
- 🏛 Institutions
- Salesforce AI Research
- 📅 Date
- April 2, 2026
- 📑 Publisher
- arXiv
- 💻 Env
- Desktop
- 🔑 Keywords
TLDR
GPA is a vision-based GUI process automation system that enables fast and stable process replay from a single demonstration. Using Sequential Monte Carlo-based localization and readiness calibration, it achieves higher success rates with 10x faster execution than Gemini 3 Pro on long-horizon GUI tasks, running entirely locally without cloud LLMs.
Related papers
- M^2: Dual-Memory Augmentation for Long-Horizon Web Agents via Trajectory Summarization and Insight RetrievalFebruary 28, 2026 · arXiv
- HealthAdminBench: Evaluating Computer-Use Agents on Healthcare Administration TasksApril 10, 2026 · arXiv
- Gym-Anything: Turn any Software into an Agent EnvironmentApril 7, 2026 · arXiv
- GUIDE: Resolving Domain Bias in GUI Agents through Real-Time Web Video Retrieval and Plug-and-Play AnnotationMarch 27, 2026 · arXiv
- OS-Marathon: Benchmarking Computer-Use Agents on Long-Horizon Repetitive TasksJanuary 28, 2026 · arXiv
- Are LLM Agents the New RPA? A Comparative Study with RPA Across Enterprise WorkflowsSeptember 4, 2025 · arXiv