GUI Agents Papers
Star · 821

Mobile-Bench-v2: A More Realistic and Comprehensive Benchmark for VLM-based Mobile Agents

Weikai Xu , Zhizheng Jiang , Yuxuan Liu , Pengzhi Gao , Wei Liu , Jian Luan , Yuanchun Li , Yunxin Liu , Bin Wang , Bo An

🏛 Institutions
NTU , University of Electronic Science and Technology of China , Renmin University of China , XiaoMi AI Lab , Institute for AI Industry Research (AIR) , Tsinghua
📅 Date
May 17, 2025
📑 Publisher
arXiv
💻 Env
Mobile
🔑 Keywords
TLDR

Mobile-Bench-v2 is a more realistic mobile-agent benchmark that fixes three weaknesses of earlier evaluation: single-path scoring, unrealistically clean environments, and over-specified instructions. It adds multi-path offline evaluation, noisy app settings with pop-ups and ads, and ambiguous-instruction splits for testing proactive interaction.

Open paper arXiv Report issue
Related papers (24)