WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models

Hongliang He, Wenlin Yao, Kaixin Ma, Wenhao Yu, Yong Dai, Hongming Zhang, Zhenzhong Lan, Dong Yu

🏛 Institutions: ZJU, Tencent AI Lab, Westlake University
📅 Date: January 25, 2024
📑 Publisher: ACL 2024
💻 Env: Web
🔑 Keywords: benchmark automatic evaluation GPT-4V judge real-world website tasks WebVoyager

TLDR

WebVoyager is an end-to-end multimodal web agent evaluated on a benchmark built from tasks over 15 live websites. The paper also introduces a GPT-4V-based automatic evaluation protocol and reports 85.3% agreement with human judgment.

Open paper Edit on GitHub Report issue