GUI Agents Papers
Star · 821

GUIDE: Interpretable GUI Agent Evaluation via Hierarchical Diagnosis

Yuwen Zhai , Runze Li , Liang Wang , Nian Shi , Liwu Xu , Wei Zhang , Ran Lin , Bo Xu , Benlei Cui

🏛 Institutions
Unknown
📅 Date
April 6, 2026
📑 Publisher
arXiv
💻 Env
General GUI
🔑 Keywords
TLDR

GUIDE decomposes GUI agent trajectory evaluation into three sequential stages — trajectory segmentation, subtask diagnosis, and structured error analysis — mirroring the compositional structure of GUI tasks. Evaluated on 932 industrial e-commerce trajectories, AGENTREWARDBENCH, and AndroidBench, it improves accuracy by up to 5.35 points over baselines while producing diagnostic insights.

Open paper arXiv Report issue
Related papers (24)