Dissecting Adversarial Robustness of Multimodal LM Agents
Chen Henry Wu , Rishi Shah , Jing Yu Koh , Ruslan Salakhutdinov , Daniel Fried , Aditi Raghunathan
- 🏛 Institutions
- CMU
- 📅 Date
- June 18, 2024
- 📑 Publisher
- ICLR 2025 (Poster)
- 💻 Env
- Web
- 🔑 Keywords
TLDR
The paper builds an adversarial extension of VisualWebArena with 200 targeted tasks and introduces the Agent Robustness Evaluation (ARE) framework for analyzing how attacks propagate through compound agent systems. It shows that small visual or textual perturbations can reliably hijack strong multimodal web agents, including variants that use reflection or tree search.
Related papers (24)
- Attacking Vision-Language Computer Agents via Pop-upsNovember 4, 2024 · ACL 2025
- WebTrap Park: An Automated Platform for Systematic Security Evaluation of Web AgentsJanuary 13, 2026 · arXiv
- It's a TRAP! Task-Redirecting Agent Persuasion Benchmark for Web AgentsDecember 29, 2025 · arXiv
- DECEPTICON: How Dark Patterns Manipulate Web AgentsDecember 28, 2025 · arXiv
- Genesis: Evolving Attack Strategies for LLM Web Agent Red-TeamingOctober 21, 2025 · ICME 2026
- Investigating the Impact of Dark Patterns on LLM-Based Web AgentsOctober 20, 2025 · IEEE S&P 2026
- RiOSWorld: Benchmarking the Risk of Multimodal Computer-Use AgentsMay 31, 2025 · NeurIPS 2025 (Poster)
- Refusal-Trained LLMs Are Easily Jailbroken As Browser AgentsOctober 11, 2024 · arXiv
- ST-WebAgentBench: A Benchmark for Evaluating Safety and Trustworthiness in Web AgentsOctober 9, 2024 · ICLR 2026 (Poster)
- VisualWebArena: Evaluating Multimodal Agents on Realistic Visual Web TasksJanuary 24, 2024 · ACL 2024
- The Blind Spot of Agent Safety: How Benign User Instructions Expose Critical Vulnerabilities in Computer-Use AgentsApril 12, 2026 · arXiv
- CORA: Conformal Risk-Controlled Agents for Safeguarded Mobile GUI AutomationApril 10, 2026 · arXiv
- Preference Redirection via Attention Concentration: An Attack on Computer Use AgentsApril 9, 2026 · arXiv
- When Benign Inputs Lead to Severe Harms: Eliciting Unsafe Unintended Behaviors of Computer-Use AgentsFebruary 9, 2026 · arXiv
- When Actions Go Off-Task: Detecting and Correcting Misaligned Actions in Computer-Use AgentsFebruary 9, 2026 · arXiv
- LPS-Bench: Benchmarking Safety Awareness of Computer-Use Agents in Long-Horizon Planning under Benign and Adversarial ScenariosFebruary 3, 2026 · arXiv
- macOSWorld: A Multilingual Interactive Benchmark for GUI AgentsJune 4, 2025 · NeurIPS 2025 (Poster)
- VPI-Bench: Visual Prompt Injection Attacks for Computer-Use AgentsJune 3, 2025 · ICLR 2026 (Poster)
- MobileSafetyBench: Evaluating Safety of Autonomous Agents in Mobile Device ControlOctober 23, 2024 · arXiv
- Odysseys: Benchmarking Web Agents on Realistic Long Horizon TasksApril 27, 2026 · arXiv
- WebForge: Breaking the Realism-Reproducibility-Scalability Trilemma in Browser Agent BenchmarkApril 13, 2026 · arXiv
- The Amazing Agent Race: Strong Tool Users, Weak NavigatorsApril 11, 2026 · arXiv
- ClawBench: Can AI Agents Complete Everyday Online Tasks?April 9, 2026 · arXiv
- GameWorld: Towards Standardized and Verifiable Evaluation of Multimodal Game AgentsApril 8, 2026 · arXiv