VLM Agents Generate Their Own Memories: Distilling Experience into Embodied Programs of Thought

Gabriel Herbert Sarch, Lawrence Jang, Michael J. Tarr, William W. Cohen, Kenneth Marino, Katerina Fragkiadaki

🏛 Institutions: Carnegie Mellon University, Google DeepMind
📅 Date: June 20, 2024
📑 Publisher: NeurIPS 2024 (Spotlight)
💻 Env
🔑 Keywords: memory In-Context Abstraction Learning programs of thought retrieval augmentation ICAL

TLDR

ICAL turns sub-optimal demonstrations and feedback into reusable multimodal memories that improve VLM and LLM agents across TEACh, VisualWebArena, and Ego4D. It is relevant to GUI work because one evaluation domain is web agents, but the method itself is a broader embodied-agent memory approach rather than a direct GUI paper.

Open paper Edit on GitHub Report issue