Caution for the Environment: Multimodal LLM Agents are Susceptible to Environmental Distractions

Xinbei Ma , Yiting Wang , Yao Yao , Tongxin Yuan , Aston Zhang , Zhuosheng Zhang , Hai Zhao

🏛 Institutions: Shanghai Jiao Tong University , Meta
📅 Date: August 5, 2024
📑 Publisher: ACL 2025
💻 Env
🔑 Keywords: safety robustness environmental distraction multimodal LLM agent

TLDR

This paper highlights the vulnerability of multimodal agents to environmental distractions. The researchers demonstrate that these agents, which process multiple types of input (e.g., text, images, audio), can be significantly impacted by irrelevant or misleading environmental cues. The study provides insights into the limitations of current multimodal systems and emphasizes the need for more robust architectures that can filter out distractions and maintain focus on relevant information in complex, real-world environments.

Open paper Report issue