VenusBench-Mobile: A Challenging and User-Centric Benchmark for Mobile GUI Agents with Capability Diagnostics

Yichen Gong , Zhuohan Cai , Sunhao Dai , Yuqi Zhou , Zhangxuan Gu , Changhua Meng , Shuheng Shen

🏛 Institutions: Ant Group , RUC
📅 Date: February 6, 2026
📑 Publisher: arXiv
💻 Env: Mobile
🔑 Keywords: benchmark user-centric capability diagnostics perception memory VenusBench-Mobile

TLDR

VenusBench-Mobile addresses the app-centric, task-homogeneous nature of prior mobile-GUI benchmarks with a user-intent-driven task design and a capability-oriented annotation scheme for fine-grained behavior analysis. SOTA mobile GUI agents see large drops relative to existing benchmarks, with failures dominated by perception and memory deficiencies and near-zero success under environment variations, signaling persistent brittleness in realistic conditions.

Open paper arXiv Report issue