GUI Agents Papers
Star · 821

A3: Android Agent Arena for Mobile GUI Agents with Essential-State Procedural Evaluation

Yuxiang Chai , Shunye Tang , Han Xiao , Weifeng Lin , Hanhao Li , Jiayu Zhang , Liang Liu , Pengxiang Zhao , Guangyi Liu , Guozhi Wang , Shuai Ren , Rongduo Han , Haining Zhang , Siyuan Huang , Hongsheng Li

🏛 Institutions
CUHK , vivo AI Lab , SJTU
📅 Date
January 2, 2025
📑 Publisher
arXiv
💻 Env
Mobile
🔑 Keywords
TLDR

A3 is a mobile GUI benchmark built from 100 tasks over 20 dynamic online Android apps to evaluate agents beyond static or offline settings. Its essential-state procedural evaluation uses MLLMs as reward models to verify both intermediate progress and final completion on real online apps.

Open paper arXiv Report issue
Related papers (24)