WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models
Hongliang He, Wenlin Yao, Kaixin Ma, Wenhao Yu, Yong Dai, Hongming Zhang, Zhenzhong Lan, Dong Yu
- 🏛 Institutions
- ZJU, Tencent AI Lab, Westlake University
- 📅 Date
- January 25, 2024
- 📑 Publisher
- ACL 2024
- 💻 Env
- Web
- 🔑 Keywords
TLDR
WebVoyager is an end-to-end multimodal web agent evaluated on a benchmark built from tasks over 15 live websites. The paper also introduces a GPT-4V-based automatic evaluation protocol and reports 85.3% agreement with human judgment.
Related papers
- REAL: Benchmarking Autonomous Agents on Deterministic Simulations of Real WebsitesApril 15, 2025 · arXiv
- SPA-Bench: A Comprehensive Benchmark for SmartPhone Agent EvaluationOctober 19, 2024 · ICLR 2025 (Spotlight)
- Odysseys: Benchmarking Web Agents on Realistic Long Horizon TasksApril 27, 2026 · arXiv
- WebForge: Breaking the Realism-Reproducibility-Scalability Trilemma in Browser Agent BenchmarkApril 13, 2026 · arXiv
- The Amazing Agent Race: Strong Tool Users, Weak NavigatorsApril 11, 2026 · arXiv
- ClawBench: Can AI Agents Complete Everyday Online Tasks?April 9, 2026 · arXiv