VLAA-GUI: Knowing When to Stop, Recover, and Search, A Modular Framework for GUI Automation

Qijun Han , Haoqin Tu , Zijun Wang , Haoyue Dai , Yiyang Zhou , Nancy Lau , Alvaro A. Cardenas , Yuhui Xu , Ran Xu , Caiming Xiong , Zeyu Zheng , Huaxiu Yao , Yuyin Zhou , Cihang Xie

🏛 Institutions: UCSC , CMU , UNC , Salesforce , UC Berkeley
📅 Date: April 23, 2026
📑 Publisher: arXiv
💻 Env: Desktop
🔑 Keywords: framework early stopping recovery VLAA-GUI

TLDR

VLAA-GUI tackles two recurring failure modes of autonomous GUI agents — premature task termination and unproductive action loops — with a modular framework that decides when to Stop, Recover, and Search. The system reaches 77.5% on OSWorld and 61.0% on WindowsAgentArena, top-performing on both with multiple LLM backbones.

Open paper arXiv Report issue