GUI Agents Papers
Star · 751

GUI-CEval: A Hierarchical and Comprehensive Chinese Benchmark for Mobile GUI Agents

Yang Li, Yuchen Liu, Haoyu Lu, Zhiqiang Xia, Hongzhen Wang, Kaiyang Han, Changpeng Yang, Jinyang Wu, Jiaming Xu, Runyu Shi, Ying Huang

🏛 Institutions
HyperAI Team, Xiaomi
📅 Date
March 16, 2026
📑 Publisher
CVPR 2026
💻 Env
Mobile
🔑 Keywords
TLDR

GUI-CEval is the first comprehensive Chinese benchmark for mobile GUI agents, spanning 201 apps across four device types with a hierarchical two-level evaluation structure (atomic abilities and application-level tasks) along five dimensions (perception, planning, reflection, execution, evaluation), revealing that most MLLMs still struggle with reflective decision-making and post-action self-evaluation.

Open paper arXiv Edit on GitHub Report issue
Related papers