GUI Agents Papers
Star · 821

Magma: A Foundation Model for Multimodal AI Agents

Jianwei Yang , Reuben Tan , Qianhui Wu , Ruijie Zheng , Baolin Peng , Yongyuan Liang , Yu Gu , Mu Cai , Seonghyeon Ye , Joel Jang , Yuquan Deng , Lars Liden , Jianfeng Gao

🏛 Institutions
Microsoft Research , University of Maryland , University of Wisconsin-Madison , KAIST , University of Washington
📅 Date
February 18, 2025
📑 Publisher
CVPR 2025
💻 Env
🔑 Keywords
TLDR

Magma is a multimodal foundation model for agentic tasks spanning digital and physical environments rather than a GUI-specific paper. It is relevant here because it reports strong UI navigation results and uses Set-of-Mark and Trace-of-Mark supervision, but its main contribution is a broader agentic model covering robotics as well as GUI tasks.

Open paper Report issue
Related papers (10)