WebShop: Towards Scalable Real-World Web Interaction with Grounded Language Agents

Shunyu Yao , Howard Chen , John Yang , Karthik Narasimhan

🏛 Institutions: Princeton
📅 Date: July 31, 2022
📑 Publisher: NeurIPS 2022
💻 Env: Web
🔑 Keywords: environment dataset benchmark e-commerce web interaction WebShop

TLDR

Introduces WebShop, an e-commerce web environment with over one million products and 12,087 shopping instructions for grounded language agents. It became an early standard benchmark for web agents by combining realistic web interaction, compositional search, and sim-to-real evaluation.

Open paper Report issue

Related papers (24)

WebForge: Breaking the Realism-Reproducibility-Scalability Trilemma in Browser Agent Benchmark

April 13, 2026 · arXiv
WebArena-Infinity: Generating Browser Environments with Verifiable Tasks at Scale

March 2026 · Blog Post
WebChain: A Large-Scale Human-Annotated Dataset of Real-World Web Interaction Traces

March 5, 2026 · arXiv
Web-CogReasoner: Towards Knowledge-Induced Cognitive Reasoning for Web Agents

August 3, 2025 · ICLR 2026 (Poster)
Web-Shepherd: Advancing PRMs for Reinforcing Web Agents

May 21, 2025 · NeurIPS 2025 (Spotlight)
RealWebAssist: A Benchmark for Long-Horizon Web Assistance with Real-World Users

April 14, 2025 · AAAI 2026
AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories

April 11, 2025 · COLM 2025
WorkArena++: Towards Compositional Planning and Reasoning-based Common Knowledge Work Tasks

July 7, 2024 · NeurIPS 2024 Datasets and Benchmarks Track (Poster)
GUI Action Narrator: Where and When Did That Action Take Place?

June 19, 2024 · arXiv
WebCanvas: Benchmarking Web Agents in Online Environments

June 18, 2024 · arXiv
GUI-World: A Video Benchmark and Dataset for Multimodal GUI-oriented Understanding

June 16, 2024 · ICLR 2025 (Poster)
OmniACT: A Dataset and Benchmark for Enabling Multimodal Generalist Autonomous Agents for Desktop and Web

February 29, 2024 · ECCV 2024 (Poster)
On the Multi-turn Instruction Following for Conversational Web Agents

February 23, 2024 · ACL 2024
WebLINX: Real-World Website Navigation with Multi-Turn Dialogue

February 8, 2024 · ICML 2024
SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents

January 17, 2024 · ACL 2024
WebVLN: Vision-and-Language Navigation on Websites

December 25, 2023 · AAAI 2024
WebArena: A Realistic Web Environment for Building Autonomous Agents

July 25, 2023 · ICLR 2024 (Poster)
Mind2Web: Towards a Generalist Agent for the Web

June 9, 2023 · NeurIPS 2023 Datasets and Benchmarks Track
Grounding Open-Domain Instructions to Automate Web Support Tasks

March 30, 2021 · NAACL 2021
WebSRC: A Dataset for Web-Based Structural Reading Comprehension

January 23, 2021 · EMNLP 2021
Gym-Anything: Turn any Software into an Agent Environment

April 7, 2026 · arXiv
PSPA-Bench: A Personalized Benchmark for Smartphone GUI Agent

March 31, 2026 · arXiv
SecAgent: Efficient Mobile GUI Agent with Semantic Context

March 9, 2026 · arXiv
Turing Test on Screen: A Benchmark for Mobile GUI Agent Humanization

February 24, 2026 · arXiv