WebArena-Infinity: Generating Browser Environments with Verifiable Tasks at Scale
- 🏛 Institutions
- Duke University
- 📅 Date
- March 2026
- 📑 Publisher
- Blog Post
- 💻 Env
- Web
- 🔑 Keywords
TLDR
WebArena-Infinity automates the generation of high-authenticity web environments with verifiable tasks from static artifacts like user manuals, using a multi-agent pipeline of coding and browser-use agents. It produces 10 environments with 1,260 tasks and 2,070 trajectories. Agents achieve notably lower success rates than on manually built benchmarks, suggesting the generated tasks capture meaningful complexity.
Related papers (24)
- WebForge: Breaking the Realism-Reproducibility-Scalability Trilemma in Browser Agent BenchmarkApril 13, 2026 · arXiv
- When Users Change Their Mind: Evaluating Interruptible Agents in Long-Horizon Web NavigationApril 1, 2026 · arXiv
- WebFactory: Automated Compression of Foundational Language Intelligence into Grounded Web AgentsMarch 5, 2026 · arXiv
- WebChain: A Large-Scale Human-Annotated Dataset of Real-World Web Interaction TracesMarch 5, 2026 · arXiv
- OpAgent: Operator Agent for Web NavigationFebruary 14, 2026 · arXiv
- InfiniteWeb: Scalable Web Environment Synthesis for GUI Agent TrainingJanuary 7, 2026 · arXiv
- WebGym: Scaling Training Environments for Visual Web Agents with Realistic TasksJanuary 5, 2026 · arXiv
- Web-CogReasoner: Towards Knowledge-Induced Cognitive Reasoning for Web AgentsAugust 3, 2025 · ICLR 2026 (Poster)
- Go-Browse: Training Web Agents with Structured ExplorationJune 4, 2025 · ICLR 2026 (Poster)
- Web-Shepherd: Advancing PRMs for Reinforcing Web AgentsMay 21, 2025 · NeurIPS 2025 (Spotlight)
- RealWebAssist: A Benchmark for Long-Horizon Web Assistance with Real-World UsersApril 14, 2025 · AAAI 2026
- AgentRewardBench: Evaluating Automatic Evaluations of Web Agent TrajectoriesApril 11, 2025 · COLM 2025
- WorkArena++: Towards Compositional Planning and Reasoning-based Common Knowledge Work TasksJuly 7, 2024 · NeurIPS 2024 Datasets and Benchmarks Track (Poster)
- GUI Action Narrator: Where and When Did That Action Take Place?June 19, 2024 · arXiv
- WebCanvas: Benchmarking Web Agents in Online EnvironmentsJune 18, 2024 · arXiv
- GUI-World: A Video Benchmark and Dataset for Multimodal GUI-oriented UnderstandingJune 16, 2024 · ICLR 2025 (Poster)
- AutoWebGLM: A Large Language Model-based Web Navigating AgentApril 4, 2024 · KDD 2024
- OmniACT: A Dataset and Benchmark for Enabling Multimodal Generalist Autonomous Agents for Desktop and WebFebruary 29, 2024 · ECCV 2024 (Poster)
- On the Multi-turn Instruction Following for Conversational Web AgentsFebruary 23, 2024 · ACL 2024
- WebLINX: Real-World Website Navigation with Multi-Turn DialogueFebruary 8, 2024 · ICML 2024
- SeeClick: Harnessing GUI Grounding for Advanced Visual GUI AgentsJanuary 17, 2024 · ACL 2024
- WebVLN: Vision-and-Language Navigation on WebsitesDecember 25, 2023 · AAAI 2024
- WebArena: A Realistic Web Environment for Building Autonomous AgentsJuly 25, 2023 · ICLR 2024 (Poster)
- Mind2Web: Towards a Generalist Agent for the WebJune 9, 2023 · NeurIPS 2023 Datasets and Benchmarks Track