GUI Agents Papers
Star · 751

MobileBench-OL: A Comprehensive Chinese Benchmark for Evaluating Mobile GUI Agents in Real-World Environment

Qinzhuo Wu, Zhizhuo Yang, Hanhao Li, Pengzhi Gao, Wei Liu, Jian Luan

🏛 Institutions
MiLM Plus, Xiaomi, PKU, CUHK
📅 Date
January 28, 2026
📑 Publisher
arXiv
💻 Env
Mobile
🔑 Keywords
TLDR

MobileBench-OL benchmarks mobile GUI agents on 1,080 online tasks from 80 Chinese apps. It extends evaluation beyond instruction following to long-horizon execution, reasoning and exploration, and robustness to real-world noise, and pairs the benchmark with an automatic evaluation pipeline that supports environment reset.

Open paper arXiv Edit on GitHub Report issue
Related papers