WebChain: A Large-Scale Human-Annotated Dataset of Real-World Web Interaction Traces

Sicheng Fan , Rui Wan , Yifei Leng , Gaoning Liang , Li Ling , Yanyi Shang , Dehan Kong

🏛 Institutions: Fudan , IMean AI
📅 Date: March 5, 2026
📑 Publisher: arXiv
💻 Env: Web
🔑 Keywords: dataset benchmark triple alignment human annotation dual mid-training WebChainBench

TLDR

WebChain is a large human-annotated dataset of real-world web interaction traces with aligned visual, structural, and action supervision. The paper also proposes Dual Mid-Training, which separates spatial grounding from planning and improves performance on WebChainBench and other public web-agent benchmarks.

Open paper arXiv Report issue