Modular and Multi-Path-Aware Offline Benchmarking for Mobile GUI Agents

Youngmin Im , Byeongung Jo , Jaeyoung Wi , Seungwoo Baek , Tae Hoon Min , Joo Hyung Lee , Sangeun Oh , Insik Shin , Sunjae Lee

🏛 Institutions: KAIST , Sungkyunkwan University , Korea University , Fluiz
📅 Date: December 14, 2025
📑 Publisher: arXiv
💻 Env: Mobile
🔑 Keywords: benchmark offline evaluation modular analysis multi-path evaluation MobiBench

TLDR

MobiBench is an offline mobile-agent benchmark that explicitly supports multiple valid action paths and evaluates agent modules separately rather than treating the system as a black box. The paper reports 94.72% agreement with human evaluators while preserving the scalability and reproducibility advantages of offline evaluation.

Open paper arXiv Report issue