GUI Agents Papers
Star · 821

SMAN-Bench: A Cross-System Benchmark for Mobile Agents under Single- and Multi-path, Ambiguous, and Noisy Tasks

Weikai Xu , Zhizheng Jiang , Yuxuan Liu , Pengzhi Gao , Wei Liu , Jian Luan , Yunxin Liu , Yuanchun Li , Bin Wang , Bo An

🏛 Institutions
NTU , University of Electronic Science and Technology of China , Renmin University of China , MiLM Plus , Xiaomi , Institute for AI Industry Research , Tsinghua
📅 Date
January 26, 2026
📑 Publisher
ICLR 2026 (Poster)
💻 Env
Mobile
🔑 Keywords
TLDR

SMAN-Bench evaluates mobile agents under single-path, multi-path, ambiguous, and noisy task settings that are poorly covered by prior benchmarks. It builds these splits from a graph-structured unlabeled mobile corpus, adds offline multi-path reward evaluation, and includes both contaminated noisy environments and preset Q&A interactions for ambiguous instructions.

Open paper Report issue
Related papers (24)