GUI Agents Papers
Star · 751

AutoGUI-v2: A Comprehensive Multi-Modal GUI Functionality Understanding Benchmark

Hongxin Li, Xiping Wang, Jingran Su, Zheng Ju, Yuntao Chen, Qing Li, Zhaoxiang Zhang

🏛 Institutions
UCAS, CASIA, PolyU, Shanghai AI Laboratory
📅 Date
April 27, 2026
📑 Publisher
arXiv
💻 Env
General GUI
🔑 Keywords
TLDR

AutoGUI-v2 unifies region-level semantics, element grounding, and state prediction into 2,753 tasks spanning six operating systems, addressing the bifurcation between black-box task-completion and shallow grounding benchmarks. Open-source models excel at functional grounding while commercial models do better at functionality description, but all struggle with complex interaction logic in uncommon actions.

Open paper arXiv Edit on GitHub Report issue
Related papers