Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents
Yu Gu , Kai Zhang , Yuting Ning , Boyuan Zheng , Boyu Gou , Tianci Xue , Cheng Chang , Sanjari Srivastava , Yanan Xie , Peng Qi , Huan Sun , Yu Su
- 🏛 Institutions
- OSU , Uniphore , Orby AI
- 📅 Date
- November 10, 2024
- 📑 Publisher
- TMLR
- 💻 Env
- Web
- 🔑 Keywords
TLDR
This paper argues that web agents should use model-based planning instead of relying heavily on backtracking search in irreversible web environments. The proposed WebDreamer framework uses an LLM world model to simulate candidate action outcomes before acting, improving over reactive baselines on benchmarks such as VisualWebArena, Online-Mind2Web, and Mind2Web-Live.
Related papers (24)
- World-Model-Augmented Web Agents with Action CorrectionFebruary 17, 2026 · arXiv
- WebWorld: A Large-Scale World Model for Web Agent TrainingFebruary 16, 2026 · arXiv
- DynaWeb: Model-Based Reinforcement Learning of Web AgentsJanuary 29, 2026 · arXiv
- UI-Oceanus: Scaling GUI Agents with Synthetic Environmental DynamicsFebruary 11, 2026 · arXiv
- Code2World: A GUI World Model via Renderable Code GenerationFebruary 10, 2026 · arXiv
- SafePred: A Predictive Guardrail for Computer-Using Agents via World ModelsFebruary 2, 2026 · arXiv
- MobileDreamer: Generative Sketch World Model for GUI AgentJanuary 7, 2026 · arXiv
- MobileWorldBench: Towards Semantic World Modeling For Mobile AgentsDecember 16, 2025 · arXiv
- R-WoM: Retrieval-augmented World Model For Computer-use AgentsOctober 13, 2025 · ICLR 2026 (Poster)
- Unlocking Smarter Device Control: Foresighted Planning with a World Model-Driven Code Execution ApproachMay 22, 2025 · Findings of EMNLP 2025
- GUI Agents for Continual Game GenerationMay 27, 2026 · arXiv
- Odysseys: Benchmarking Web Agents on Realistic Long Horizon TasksApril 27, 2026 · arXiv
- WebForge: Breaking the Realism-Reproducibility-Scalability Trilemma in Browser Agent BenchmarkApril 13, 2026 · arXiv
- The Amazing Agent Race: Strong Tool Users, Weak NavigatorsApril 11, 2026 · arXiv
- Same Outcomes, Different Journeys: A Trace-Level Framework for Comparing Human and GUI-Agent Behavior in Production Search SystemsApril 9, 2026 · arXiv
- MolmoWeb: Open Visual Web Agent and Open Data for the Open WebApril 9, 2026 · arXiv
- ClawBench: Can AI Agents Complete Everyday Online Tasks?April 9, 2026 · arXiv
- GameWorld: Towards Standardized and Verifiable Evaluation of Multimodal Game AgentsApril 8, 2026 · arXiv
- WebSP-Eval: Evaluating Web Agents on Website Security and Privacy TasksApril 7, 2026 · arXiv
- The Art of Building Verifiers for Computer Use AgentsApril 5, 2026 · arXiv
- The Tool Illusion: Rethinking Tool Use in Web AgentsApril 3, 2026 · arXiv
- When Users Change Their Mind: Evaluating Interruptible Agents in Long-Horizon Web NavigationApril 1, 2026 · arXiv
- WebArena-Infinity: Generating Browser Environments with Verifiable Tasks at ScaleMarch 2026 · Blog Post
- Vision2Web: A Hierarchical Benchmark for Visual Website Development with Agent VerificationMarch 27, 2026 · arXiv