VLAA-GUI: Knowing When to Stop, Recover, and Search, A Modular Framework for GUI Automation
Qijun Han, Haoqin Tu, Zijun Wang, Haoyue Dai, Yiyang Zhou, Nancy Lau, Alvaro A. Cardenas, Yuhui Xu, Ran Xu, Caiming Xiong, Zeyu Zheng, Huaxiu Yao, Yuyin Zhou, Cihang Xie
- 🏛 Institutions
- UCSC, CMU, UNC, Salesforce, UC Berkeley
- 📅 Date
- April 23, 2026
- 📑 Publisher
- arXiv
- 💻 Env
- Desktop
- 🔑 Keywords
TLDR
VLAA-GUI tackles two recurring failure modes of autonomous GUI agents — premature task termination and unproductive action loops — with a modular framework that decides when to Stop, Recover, and Search. The system reaches 77.5% on OSWorld and 61.0% on WindowsAgentArena, top-performing on both with multiple LLM backbones.
Related papers
- EE-MCP: Self-Evolving MCP-GUI Agents via Automated Environment Generation and Experience LearningApril 10, 2026 · arXiv
- ShowUI-Aloha: Human-Taught GUI AgentJanuary 12, 2026 · arXiv
- Surfer 2: The Next Generation of Cross-Platform Computer Use AgentsOctober 22, 2025 · arXiv
- BIMgent: Towards Autonomous Building Modeling via Computer-use AgentsJune 8, 2025 · ICML 2025 Workshop on Computer-use Agents
- LiteCUA: Computer as MCP Server for Computer-Use Agent on AIOSMay 24, 2025 · arXiv
- UFO2: The Desktop AgentOSApril 20, 2025 · arXiv