MobileDreamer: Generative Sketch World Model for GUI Agent
Yilin Cao, Yufeng Zhong, Zhixiong Zeng, Liming Zheng, Jing Huang, Haibo Qiu, Peng Shi, Wenji Mao, Wan Guanglu
- 🏛 Institutions
- State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, CAS, University of Chinese Academy of Sciences, Meituan
- 📅 Date
- January 7, 2026
- 📑 Publisher
- arXiv
- 💻 Env
- Mobile
- 🔑 Keywords
TLDR
MobileDreamer equips mobile GUI agents with a lightweight world model that predicts task-relevant textual sketches of future interface states instead of full screenshots. It then uses rollout imagination over those predicted futures for action selection, improving AndroidWorld performance by 5.25% and reaching state of the art.
Related papers
- UI-Oceanus: Scaling GUI Agents with Synthetic Environmental DynamicsFebruary 11, 2026 · arXiv
- Code2World: A GUI World Model via Renderable Code GenerationFebruary 10, 2026 · arXiv
- MobileWorldBench: Towards Semantic World Modeling For Mobile AgentsDecember 16, 2025 · arXiv
- Unlocking Smarter Device Control: Foresighted Planning with a World Model-Driven Code Execution ApproachMay 22, 2025 · Findings of EMNLP 2025
- World-Model-Augmented Web Agents with Action CorrectionFebruary 17, 2026 · arXiv
- WebWorld: A Large-Scale World Model for Web Agent TrainingFebruary 16, 2026 · arXiv