MAI-UI Technical Report: Real-World Centric Foundation GUI Agents

Hanzhang Zhou , Xu Zhang , Panrong Tong , Jianan Zhang , Liangyu Chen , Quyu Kong , Chenglin Cai , Chen Liu , Yue Wang , Jingren Zhou , Steven Hoi

🏛 Institutions: Tongyi Lab , Alibaba Group
📅 Date: December 26, 2025
📑 Publisher: arXiv
💻 Env: Desktop Mobile
🔑 Keywords: model agent-user interaction MCP device-cloud collaboration online reinforcement learning MAI-UI

TLDR

MAI-UI is a foundation GUI-agent family aimed at realistic deployment rather than benchmark-only optimization. It extends pure UI control with agent-user interaction, MCP tool calls, native device-cloud collaboration, and long-horizon online reinforcement learning, and sets strong results on grounding, AndroidWorld, and MobileWorld.

Open paper arXiv Report issue