GUI Agents Papers
Star · 821

SPA-Bench: A Comprehensive Benchmark for SmartPhone Agent Evaluation

Jingxuan Chen , Derek Yuen , Bin Xie , Yuhao Yang , Gongwei Chen , Zhihao Wu , Yixing Li , Xurui Zhou , Weiwen Liu , Shuai Wang , Kaiwen Zhou , Rui Shao , Liqiang Nie , Yasheng Wang , Jianye Hao , Jun Wang , Kun Shao

🏛 Institutions
Huawei Noah's Ark Lab , HIT-Shenzhen , Tianjin University , UCL
📅 Date
October 19, 2024
📑 Publisher
ICLR 2025 (Spotlight)
💻 Env
Mobile
🔑 Keywords
TLDR

SPA-Bench is a smartphone-agent benchmark built around 340 Android tasks spanning single-app and cross-app settings in both English and Chinese, with system and third-party apps. It also provides a plug-and-play execution framework and an automatic evaluation pipeline with seven task-completion and resource-usage metrics, exposing persistent difficulties in mobile UI interpretation, grounding, and long-horizon execution.

Open paper Report issue
Related papers (24)