Scaling Web Agent Training through Automatic Data Generation and Fine-grained Evaluation

Lajanugen Logeswaran , Jaekyeom Kim , Sungryull Sohn , Creighton Glasscock , Honglak Lee

🏛 Institutions: LG AI Research
📅 Date: February 13, 2026
📑 Publisher: COLM 2025
💻 Env: Web
🔑 Keywords: data generation fine-grained evaluation constraint-based evaluator BookingArena knowledge distillation partial progress

TLDR

This paper builds a scalable web-agent training pipeline around a constraint-based evaluator that scores partial progress instead of only final success. It introduces BookingArena and shows that using automatically generated data plus fine-grained evaluation can train smaller web agents that match or exceed much larger systems.

Open paper arXiv Report issue