MobileWorld: Benchmarking Autonomous Mobile Agents in Agent-User Interactive and MCP-Augmented Environments

Quyu Kong , Xu Zhang , Zhenyu Yang , Nolan Gao , Chen Liu , Panrong Tong , Chenglin Cai , Hanzhang Zhou , Jianan Zhang , Liangyu Chen , Zhidan Liu , Steven Hoi , Yue Wang

🏛 Institutions: Tongyi Lab , Alibaba Group , HKUST(GZ) , University of Florida
📅 Date: December 22, 2025
📑 Publisher: arXiv
💻 Env: Mobile
🔑 Keywords: benchmark agent-user interaction MCP cross-app workflows long-horizon tasks MobileWorld

TLDR

MobileWorld is a harder mobile-agent benchmark built to move beyond AndroidWorld by adding longer cross-app workflows, explicit user interaction, and MCP-augmented tool use. Across 201 tasks over 20 apps, it shows that current agents remain weak at clarification, memory, tool integration, and long-horizon coordination.

Open paper arXiv Report issue