WebShop: Towards Scalable Real-World Web Interaction with Grounded Language Agents
Shunyu Yao , Howard Chen , John Yang , Karthik Narasimhan
- 🏛 Institutions
- Princeton
- 📅 Date
- July 31, 2022
- 📑 Publisher
- NeurIPS 2022
- 💻 Env
- Web
- 🔑 Keywords
TLDR
Introduces WebShop, an e-commerce web environment with over one million products and 12,087 shopping instructions for grounded language agents. It became an early standard benchmark for web agents by combining realistic web interaction, compositional search, and sim-to-real evaluation.
Related papers (24)
- WebForge: Breaking the Realism-Reproducibility-Scalability Trilemma in Browser Agent BenchmarkApril 13, 2026 · arXiv
- WebArena-Infinity: Generating Browser Environments with Verifiable Tasks at ScaleMarch 2026 · Blog Post
- WebChain: A Large-Scale Human-Annotated Dataset of Real-World Web Interaction TracesMarch 5, 2026 · arXiv
- Web-CogReasoner: Towards Knowledge-Induced Cognitive Reasoning for Web AgentsAugust 3, 2025 · ICLR 2026 (Poster)
- Web-Shepherd: Advancing PRMs for Reinforcing Web AgentsMay 21, 2025 · NeurIPS 2025 (Spotlight)
- RealWebAssist: A Benchmark for Long-Horizon Web Assistance with Real-World UsersApril 14, 2025 · AAAI 2026
- AgentRewardBench: Evaluating Automatic Evaluations of Web Agent TrajectoriesApril 11, 2025 · COLM 2025
- WorkArena++: Towards Compositional Planning and Reasoning-based Common Knowledge Work TasksJuly 7, 2024 · NeurIPS 2024 Datasets and Benchmarks Track (Poster)
- GUI Action Narrator: Where and When Did That Action Take Place?June 19, 2024 · arXiv
- WebCanvas: Benchmarking Web Agents in Online EnvironmentsJune 18, 2024 · arXiv
- GUI-World: A Video Benchmark and Dataset for Multimodal GUI-oriented UnderstandingJune 16, 2024 · ICLR 2025 (Poster)
- OmniACT: A Dataset and Benchmark for Enabling Multimodal Generalist Autonomous Agents for Desktop and WebFebruary 29, 2024 · ECCV 2024 (Poster)
- On the Multi-turn Instruction Following for Conversational Web AgentsFebruary 23, 2024 · ACL 2024
- WebLINX: Real-World Website Navigation with Multi-Turn DialogueFebruary 8, 2024 · ICML 2024
- SeeClick: Harnessing GUI Grounding for Advanced Visual GUI AgentsJanuary 17, 2024 · ACL 2024
- WebVLN: Vision-and-Language Navigation on WebsitesDecember 25, 2023 · AAAI 2024
- WebArena: A Realistic Web Environment for Building Autonomous AgentsJuly 25, 2023 · ICLR 2024 (Poster)
- Mind2Web: Towards a Generalist Agent for the WebJune 9, 2023 · NeurIPS 2023 Datasets and Benchmarks Track
- Grounding Open-Domain Instructions to Automate Web Support TasksMarch 30, 2021 · NAACL 2021
- WebSRC: A Dataset for Web-Based Structural Reading ComprehensionJanuary 23, 2021 · EMNLP 2021
- Gym-Anything: Turn any Software into an Agent EnvironmentApril 7, 2026 · arXiv
- PSPA-Bench: A Personalized Benchmark for Smartphone GUI AgentMarch 31, 2026 · arXiv
- SecAgent: Efficient Mobile GUI Agent with Semantic ContextMarch 9, 2026 · arXiv
- Turing Test on Screen: A Benchmark for Mobile GUI Agent HumanizationFebruary 24, 2026 · arXiv