WebChain: A Large-Scale Human-Annotated Dataset of Real-World Web Interaction Traces
Sicheng Fan, Rui Wan, Yifei Leng, Gaoning Liang, Li Ling, Yanyi Shang, Dehan Kong
- 🏛 Institutions
- Fudan, IMean AI
- 📅 Date
- March 5, 2026
- 📑 Publisher
- arXiv
- 💻 Env
- Web
- 🔑 Keywords
TLDR
WebChain is a large human-annotated dataset of real-world web interaction traces with aligned visual, structural, and action supervision. The paper also proposes Dual Mid-Training, which separates spatial grounding from planning and improves performance on WebChainBench and other public web-agent benchmarks.
Related papers
- WebForge: Breaking the Realism-Reproducibility-Scalability Trilemma in Browser Agent BenchmarkApril 13, 2026 · arXiv
- WebArena-Infinity: Generating Browser Environments with Verifiable Tasks at ScaleMarch 2026 · Blog Post
- Web-CogReasoner: Towards Knowledge-Induced Cognitive Reasoning for Web AgentsAugust 3, 2025 · ICLR 2026 (Poster)
- Web-Shepherd: Advancing PRMs for Reinforcing Web AgentsMay 21, 2025 · NeurIPS 2025 (Spotlight)
- RealWebAssist: A Benchmark for Long-Horizon Web Assistance with Real-World UsersApril 14, 2025 · AAAI 2026
- AgentRewardBench: Evaluating Automatic Evaluations of Web Agent TrajectoriesApril 11, 2025 · COLM 2025