Visual Confused Deputy: Exploiting and Defending Perception Failures in Computer-Using Agents

Xunzhuo Liu , Bowei He , Xue Liu , Andy Luo , Haichen Zhang , Huamin Chen

🏛 Institutions: McGill University , AMD , Red Hat
📅 Date: March 16, 2026
📑 Publisher: arXiv
💻 Env: General GUI
🔑 Keywords: security guardrail visual confused deputy TOCTOU grounding errors dual-channel contrastive classification

TLDR

This paper reframes perception failures in GUI agents as a security problem rather than just a performance issue, formalizing the visual confused deputy where misperceived UI state causes privileged actions on the wrong target. It then proposes a dual-channel guardrail that separately checks the visual target and the agent's textual reasoning to block unsafe executions.

Open paper arXiv Report issue