GUI Agents Papers
Star · 821

MolmoWeb: Open Visual Web Agent and Open Data for the Open Web

Tanmay Gupta , Piper Wolters , Zixian Ma , Peter Sushko , Rock Yuren Pang , Diego Llanes , Yue Yang , Taira Anderson , Boyuan Zheng , Zhongzheng Ren , Harsh Trivedi , Taylor Blanton , Caleb Ouellette , Winson Han , Ali Farhadi , Ranjay Krishna

🏛 Institutions
AI2 , UW , UNC
📅 Date
April 9, 2026
📑 Publisher
arXiv
💻 Env
Web
🔑 Keywords
TLDR

MolmoWeb is a family of fully open multimodal web agents (4B and 8B) trained on MolmoWebMix (100K+ synthetic trajectories and 30K+ human demonstrations). Operating as screenshot-only visual-language action policies without HTML or accessibility tree access, it achieves SOTA on WebVoyager, Online-Mind2Web, and DeepShop, outperforming larger closed models like GPT-4o.

Open paper arXiv Report issue
Related papers (24)