GUI Agents Papers
Star · 821

AndroidDaily: A Verifiable Benchmark for Mobile GUI Agents on Real-World Closed-Source Applications

Yifan Sui , Xin Huang , Hongbing Li , Fang Xu , Jiahe Lv , Haolong Yan , Yeqing Shen , Litao Liu , Zhimin Fan , Ziyang Meng , Jia Wang , Junbo Qi , Kaijun Tan , Zheng Ge , Xiangyu Zhang , Daxin Jiang , Osamu Yoshie

🏛 Institutions
BUPT , StepFun , Waseda
📅 Date
May 26, 2026
📑 Publisher
arXiv
💻 Env
Mobile
🔑 Keywords
TLDR

AndroidDaily is a verifiable benchmark of 350 daily-use tasks across 94 commercial, closed-source Android apps for evaluating mobile GUI agents. It introduces GRADE, an evaluator that judges agents by tracking the visual trajectory against observable external guidelines rather than internal app state, reaching 87.37% agreement with human judgment.

Open paper arXiv Report issue
Related papers (24)