MAI-UI Technical Report: Real-World Centric Foundation GUI Agents
Hanzhang Zhou, Xu Zhang, Panrong Tong, Jianan Zhang, Liangyu Chen, Quyu Kong, Chenglin Cai, Chen Liu, Yue Wang, Jingren Zhou, Steven Hoi
- 🏛 Institutions
- Tongyi Lab, Alibaba Group
- 📅 Date
- December 26, 2025
- 📑 Publisher
- arXiv
- 💻 Env
- Desktop Mobile
- 🔑 Keywords
TLDR
MAI-UI is a foundation GUI-agent family aimed at realistic deployment rather than benchmark-only optimization. It extends pure UI control with agent-user interaction, MCP tool calls, native device-cloud collaboration, and long-horizon online reinforcement learning, and sets strong results on grounding, AndroidWorld, and MobileWorld.
Related papers
- MobileWorld: Benchmarking Autonomous Mobile Agents in Agent-User Interactive and MCP-Augmented EnvironmentsDecember 22, 2025 · arXiv
- UltraCUA: A Foundation Model for Computer Use Agents with Hybrid ActionOctober 20, 2025 · arXiv
- Mobile-Agent-v3.5: Multi-platform Fundamental GUI AgentsFebruary 15, 2026 · arXiv
- OmegaUse: Building a General-Purpose GUI Agent for Autonomous Task ExecutionJanuary 28, 2026 · arXiv
- Step-GUI Technical ReportDecember 17, 2025 · arXiv
- ScaleCUA: Scaling Open-Source Computer Use Agents with Cross-Platform DataSeptember 18, 2025 · ICLR 2026 (Oral)