Mind2Web: Towards a Generalist Agent for the Web
Xiang Deng, Yu Gu, Boyuan Zheng, Shijie Chen, Sam Stevens, Boshi Wang, Huan Sun, Yu Su
- 🏛 Institutions
- OSU
- 📅 Date
- June 9, 2023
- 📑 Publisher
- NeurIPS 2023 Datasets and Benchmarks Track
- 💻 Env
- Web
- 🔑 Keywords
TLDR
Introduces Mind2Web, a benchmark of realistic language-guided web tasks across 137 websites and 31 domains. The companion MindAct framework uses smaller models for element ranking to help larger models operate on messy real-world HTML, making the paper a cornerstone for modern web-agent evaluation.
Related papers
- WebForge: Breaking the Realism-Reproducibility-Scalability Trilemma in Browser Agent BenchmarkApril 13, 2026 · arXiv
- WebArena-Infinity: Generating Browser Environments with Verifiable Tasks at ScaleMarch 2026 · Blog Post
- WebChain: A Large-Scale Human-Annotated Dataset of Real-World Web Interaction TracesMarch 5, 2026 · arXiv
- Web-CogReasoner: Towards Knowledge-Induced Cognitive Reasoning for Web AgentsAugust 3, 2025 · ICLR 2026 (Poster)
- Web-Shepherd: Advancing PRMs for Reinforcing Web AgentsMay 21, 2025 · NeurIPS 2025 (Spotlight)
- RealWebAssist: A Benchmark for Long-Horizon Web Assistance with Real-World UsersApril 14, 2025 · AAAI 2026