MolmoWeb: Open Visual Web Agent and Open Data for the Open Web

Tanmay Gupta , Piper Wolters , Zixian Ma , Peter Sushko , Rock Yuren Pang , Diego Llanes , Yue Yang , Taira Anderson , Boyuan Zheng , Zhongzheng Ren , Harsh Trivedi , Taylor Blanton , Caleb Ouellette , Winson Han , Ali Farhadi , Ranjay Krishna

🏛 Institutions: AI2 , UW , UNC
📅 Date: April 9, 2026
📑 Publisher: arXiv
💻 Env: Web
🔑 Keywords: model dataset MolmoWeb open-source

TLDR

MolmoWeb is a family of fully open multimodal web agents (4B and 8B) trained on MolmoWebMix (100K+ synthetic trajectories and 30K+ human demonstrations). Operating as screenshot-only visual-language action policies without HTML or accessibility tree access, it achieves SOTA on WebVoyager, Online-Mind2Web, and DeepShop, outperforming larger closed models like GPT-4o.

Open paper arXiv Report issue