GUI Agents Papers
Star · 751

A3: Android Agent Arena for Mobile GUI Agents with Essential-State Procedural Evaluation

Yuxiang Chai, Shunye Tang, Han Xiao, Weifeng Lin, Hanhao Li, Jiayu Zhang, Liang Liu, Pengxiang Zhao, Guangyi Liu, Guozhi Wang, Shuai Ren, Rongduo Han, Haining Zhang, Siyuan Huang, Hongsheng Li

🏛 Institutions
CUHK, vivo AI Lab, SJTU
📅 Date
January 2, 2025
📑 Publisher
arXiv
💻 Env
Mobile
🔑 Keywords
TLDR

A3 is a mobile GUI benchmark built from 100 tasks over 20 dynamic online Android apps to evaluate agents beyond static or offline settings. Its essential-state procedural evaluation uses MLLMs as reward models to verify both intermediate progress and final completion on real online apps.

Open paper arXiv Edit on GitHub Report issue
Related papers