Magma: A Foundation Model for Multimodal AI Agents
Jianwei Yang , Reuben Tan , Qianhui Wu , Ruijie Zheng , Baolin Peng , Yongyuan Liang , Yu Gu , Mu Cai , Seonghyeon Ye , Joel Jang , Yuquan Deng , Lars Liden , Jianfeng Gao
- 🏛 Institutions
- Microsoft Research , University of Maryland , University of Wisconsin-Madison , KAIST , University of Washington
- 📅 Date
- February 18, 2025
- 📑 Publisher
- CVPR 2025
- 💻 Env
- 🔑 Keywords
TLDR
Magma is a multimodal foundation model for agentic tasks spanning digital and physical environments rather than a GUI-specific paper. It is relevant here because it reports strong UI navigation results and uses Set-of-Mark and Trace-of-Mark supervision, but its main contribution is a broader agentic model covering robotics as well as GUI tasks.
Related papers (10)
- Training Computer Use Agents to Assess the Usability of Graphical User InterfacesApril 28, 2026 · arXiv
- ClawGUI: A Unified Framework for Training, Evaluating, and Deploying GUI AgentsApril 13, 2026 · arXiv
- MolmoWeb: Open Visual Web Agent and Open Data for the Open WebApril 9, 2026 · arXiv
- IntentScore: Intent-Conditioned Action Evaluation for Computer-Use AgentsApril 6, 2026 · arXiv
- SecAgent: Efficient Mobile GUI Agent with Semantic ContextMarch 9, 2026 · arXiv
- Mobile-Agent-v3.5: Multi-platform Fundamental GUI AgentsFebruary 15, 2026 · arXiv
- UI-Oceanus: Scaling GUI Agents with Synthetic Environmental DynamicsFebruary 11, 2026 · arXiv
- OmegaUse: Building a General-Purpose GUI Agent for Autonomous Task ExecutionJanuary 28, 2026 · arXiv
- EvoCUA: Evolving Computer Use Agents via Learning from Scalable Synthetic ExperienceJanuary 22, 2026 · arXiv
- ShowUI-π: Flow-based Generative Models as GUI Dexterous HandsDecember 31, 2025 · arXiv