GUI Agents Papers
Star · 751

Magma: A Foundation Model for Multimodal AI Agents

Jianwei Yang, Reuben Tan, Qianhui Wu, Ruijie Zheng, Baolin Peng, Yongyuan Liang, Yu Gu, Mu Cai, Seonghyeon Ye, Joel Jang, Yuquan Deng, Lars Liden, Jianfeng Gao

🏛 Institutions
Microsoft Research, University of Maryland, University of Wisconsin-Madison, KAIST, University of Washington
📅 Date
February 18, 2025
📑 Publisher
CVPR 2025
💻 Env
🔑 Keywords
TLDR

Magma is a multimodal foundation model for agentic tasks spanning digital and physical environments rather than a GUI-specific paper. It is relevant here because it reports strong UI navigation results and uses Set-of-Mark and Trace-of-Mark supervision, but its main contribution is a broader agentic model covering robotics as well as GUI tasks.

Open paper Edit on GitHub Report issue
Related papers