GUI Agents Papers
Star · 751

GUIDE: Interpretable GUI Agent Evaluation via Hierarchical Diagnosis

Yuwen Zhai, Runze Li, Liang Wang, Nian Shi, Liwu Xu, Wei Zhang, Ran Lin, Bo Xu, Benlei Cui

🏛 Institutions
Unknown
📅 Date
April 6, 2026
📑 Publisher
arXiv
💻 Env
General GUI
🔑 Keywords
TLDR

GUIDE decomposes GUI agent trajectory evaluation into three sequential stages — trajectory segmentation, subtask diagnosis, and structured error analysis — mirroring the compositional structure of GUI tasks. Evaluated on 932 industrial e-commerce trajectories, AGENTREWARDBENCH, and AndroidBench, it improves accuracy by up to 5.35 points over baselines while producing diagnostic insights.

Open paper arXiv Edit on GitHub Report issue
Related papers