GUI Agents Papers
Star · 751

Caution for the Environment: Multimodal LLM Agents are Susceptible to Environmental Distractions

Xinbei Ma, Yiting Wang, Yao Yao, Tongxin Yuan, Aston Zhang, Zhuosheng Zhang, Hai Zhao

🏛 Institutions
Shanghai Jiao Tong University, Meta
📅 Date
August 5, 2024
📑 Publisher
ACL 2025
💻 Env
🔑 Keywords
TLDR

This paper highlights the vulnerability of multimodal agents to environmental distractions. The researchers demonstrate that these agents, which process multiple types of input (e.g., text, images, audio), can be significantly impacted by irrelevant or misleading environmental cues. The study provides insights into the limitations of current multimodal systems and emphasizes the need for more robust architectures that can filter out distractions and maintain focus on relevant information in complex, real-world environments.

Open paper Edit on GitHub Report issue
Related papers