SMAN-Bench: A Cross-System Benchmark for Mobile Agents under Single- and Multi-path, Ambiguous, and Noisy Tasks

Weikai Xu , Zhizheng Jiang , Yuxuan Liu , Pengzhi Gao , Wei Liu , Jian Luan , Yunxin Liu , Yuanchun Li , Bin Wang , Bo An

🏛 Institutions: NTU , University of Electronic Science and Technology of China , Renmin University of China , MiLM Plus , Xiaomi , Institute for AI Industry Research , Tsinghua
📅 Date: January 26, 2026
📑 Publisher: ICLR 2026 (Poster)
💻 Env: Mobile
🔑 Keywords: benchmark dataset multi-path evaluation ambiguous instructions noisy environment SMAN-Bench

TLDR

SMAN-Bench evaluates mobile agents under single-path, multi-path, ambiguous, and noisy task settings that are poorly covered by prior benchmarks. It builds these splits from a graph-structured unlabeled mobile corpus, adds offline multi-path reward evaluation, and includes both contaminated noisy environments and preset Q&A interactions for ambiguous instructions.

Open paper Report issue