Naive Visual Memory is Not Enough: A Failure-Mode Study of GUI Agents

Seoyoung Choi , Minseok Ko , Hyunseok Lee , Kunwoong Kim , Woomin Song , Chanseok Jeon , Jinwoo Shin

🏛 Institutions: Unknown
📅 Date: June 12, 2026
📑 Publisher: arXiv
💻 Env: General GUI
🔑 Keywords: visual memory failure modes experiential memory grounding error

TLDR

This paper studies how visual memory affects GUI agents by classifying failures into cognitive failure, visual state misunderstanding, hidden operation blindness, and grounding error. It finds that full-image memory can reduce state-level failures while worsening action-level failures such as hidden operation blindness and grounding errors.

Open paper arXiv Report issue