VLM Agents Generate Their Own Memories: Distilling Experience into Embodied Programs of Thought
Gabriel Herbert Sarch, Lawrence Jang, Michael J. Tarr, William W. Cohen, Kenneth Marino, Katerina Fragkiadaki
- 🏛 Institutions
- Carnegie Mellon University, Google DeepMind
- 📅 Date
- June 20, 2024
- 📑 Publisher
- NeurIPS 2024 (Spotlight)
- 💻 Env
- 🔑 Keywords
TLDR
ICAL turns sub-optimal demonstrations and feedback into reusable multimodal memories that improve VLM and LLM agents across TEACh, VisualWebArena, and Ego4D. It is relevant to GUI work because one evaluation domain is web agents, but the method itself is a broader embodied-agent memory approach rather than a direct GUI paper.
Related papers
- Hybrid Self-evolving Structured Memory for GUI AgentsMarch 11, 2026 · arXiv
- Enhancing Web Agents with a Hierarchical Memory TreeMarch 7, 2026 · arXiv
- Mobile-Agent-v3.5: Multi-platform Fundamental GUI AgentsFebruary 15, 2026 · arXiv
- VenusBench-Mobile: A Challenging and User-Centric Benchmark for Mobile GUI Agents with Capability DiagnosticsFebruary 6, 2026 · arXiv
- UI-Mem: Self-Evolving Experience Memory for Online Reinforcement Learning in Mobile GUI AgentsFebruary 5, 2026 · arXiv
- MemGUI-Bench: Benchmarking Memory of Mobile GUI Agents in Dynamic EnvironmentsFebruary 3, 2026 · arXiv