GUI Agents Papers
Star · 821

GUI-CEval: A Hierarchical and Comprehensive Chinese Benchmark for Mobile GUI Agents

Yang Li , Yuchen Liu , Haoyu Lu , Zhiqiang Xia , Hongzhen Wang , Kaiyang Han , Changpeng Yang , Jinyang Wu , Jiaming Xu , Runyu Shi , Ying Huang

🏛 Institutions
HyperAI Team , Xiaomi
📅 Date
March 16, 2026
📑 Publisher
CVPR 2026
💻 Env
Mobile
🔑 Keywords
TLDR

GUI-CEval is the first comprehensive Chinese benchmark for mobile GUI agents, spanning 201 apps across four device types with a hierarchical two-level evaluation structure (atomic abilities and application-level tasks) along five dimensions (perception, planning, reflection, execution, evaluation), revealing that most MLLMs still struggle with reflective decision-making and post-action self-evaluation.

Open paper arXiv Report issue
Related papers (24)