GUI Agents Papers
Star · 751

SPA-Bench: A Comprehensive Benchmark for SmartPhone Agent Evaluation

Jingxuan Chen, Derek Yuen, Bin Xie, Yuhao Yang, Gongwei Chen, Zhihao Wu, Yixing Li, Xurui Zhou, Weiwen Liu, Shuai Wang, Kaiwen Zhou, Rui Shao, Liqiang Nie, Yasheng Wang, Jianye Hao, Jun Wang, Kun Shao

🏛 Institutions
Huawei Noah's Ark Lab, HIT-Shenzhen, Tianjin University, UCL
📅 Date
October 19, 2024
📑 Publisher
ICLR 2025 (Spotlight)
💻 Env
Mobile
🔑 Keywords
TLDR

SPA-Bench is a smartphone-agent benchmark built around 340 Android tasks spanning single-app and cross-app settings in both English and Chinese, with system and third-party apps. It also provides a plug-and-play execution framework and an automatic evaluation pipeline with seven task-completion and resource-usage metrics, exposing persistent difficulties in mobile UI interpretation, grounding, and long-horizon execution.

Open paper Edit on GitHub Report issue
Related papers