WebFactory: Automated Compression of Foundational Language Intelligence into Grounded Web Agents

Sicheng Fan , Qingyun Shi , Shengze Xu , Shengbo Cai , Tieyong Zeng , Li Ling , Yanyi Shang , Dehan Kong

🏛 Institutions: Fudan , IMean AI , CUHK , Tsinghua
📅 Date: March 5, 2026
📑 Publisher: arXiv
💻 Env: Web
🔑 Keywords: reinforcement learning synthetic data environment synthesis task generation embodiment potential WebFactory

TLDR

WebFactory presents a closed-loop training pipeline that compresses LLM latent internet knowledge into grounded web-agent behavior through synthetic environment generation, task generation, trajectory collection, and decomposed-reward RL. It matches agents trained on comparable amounts of human data while using synthetic data from only 10 websites.

Open paper arXiv Report issue