GUI Agents Papers
Star · 751

VLAA-GUI: Knowing When to Stop, Recover, and Search, A Modular Framework for GUI Automation

Qijun Han, Haoqin Tu, Zijun Wang, Haoyue Dai, Yiyang Zhou, Nancy Lau, Alvaro A. Cardenas, Yuhui Xu, Ran Xu, Caiming Xiong, Zeyu Zheng, Huaxiu Yao, Yuyin Zhou, Cihang Xie

🏛 Institutions
UCSC, CMU, UNC, Salesforce, UC Berkeley
📅 Date
April 23, 2026
📑 Publisher
arXiv
💻 Env
Desktop
🔑 Keywords
TLDR

VLAA-GUI tackles two recurring failure modes of autonomous GUI agents — premature task termination and unproductive action loops — with a modular framework that decides when to Stop, Recover, and Search. The system reaches 77.5% on OSWorld and 61.0% on WindowsAgentArena, top-performing on both with multiple LLM backbones.

Open paper arXiv Edit on GitHub Report issue
Related papers