Dual-Modality Multi-Stage Adversarial Safety Training: Robustifying Multimodal Web Agents Against Cross-Modal Attacks
Haoyu Liu , Dingcheng Li , Lukas Rutishauser , Zeyu Zheng
- 🏛 Institutions
- UC Berkeley , IEOR & BAIR , Google , Google DeepMind
- 📅 Date
- March 4, 2026
- 📑 Publisher
- arXiv
- 💻 Env
- Web
- 🔑 Keywords
TLDR
DMAST is a three-stage adversarial safety training pipeline for multimodal web agents that jointly reasons over screenshots and accessibility trees. It targets cross-modal DOM injection attacks, substantially reducing adversarial risk while improving efficiency on out-of-distribution MiniWob++ tasks.
Related papers (24)
- WebSentinel: Detecting and Localizing Prompt Injection Attacks for Web AgentsFebruary 3, 2026 · arXiv
- The Blind Spot of Agent Safety: How Benign User Instructions Expose Critical Vulnerabilities in Computer-Use AgentsApril 12, 2026 · arXiv
- Preference Redirection via Attention Concentration: An Attack on Computer Use AgentsApril 9, 2026 · arXiv
- A Survey on the Safety and Security Threats of Computer-Using Agents: JARVIS or Ultron?May 16, 2025 · arXiv
- WebSP-Eval: Evaluating Web Agents on Website Security and Privacy TasksApril 7, 2026 · arXiv
- WebTrap Park: An Automated Platform for Systematic Security Evaluation of Web AgentsJanuary 13, 2026 · arXiv
- It's a TRAP! Task-Redirecting Agent Persuasion Benchmark for Web AgentsDecember 29, 2025 · arXiv
- DECEPTICON: How Dark Patterns Manipulate Web AgentsDecember 28, 2025 · arXiv
- Permission Manifests for Web AgentsDecember 7, 2025 · arXiv
- Genesis: Evolving Attack Strategies for LLM Web Agent Red-TeamingOctober 21, 2025 · ICME 2026
- Investigating the Impact of Dark Patterns on LLM-Based Web AgentsOctober 20, 2025 · IEEE S&P 2026
- HackWorld: Evaluating Computer-Use Agents on Exploiting Web Application VulnerabilitiesOctober 14, 2025 · ICLR 2026 (Poster)
- Environmental Injection Attacks against GUI Agents in Realistic Dynamic EnvironmentsSeptember 14, 2025 · arXiv
- RiOSWorld: Benchmarking the Risk of Multimodal Computer-Use AgentsMay 31, 2025 · NeurIPS 2025 (Poster)
- RedTeamCUA: Realistic Adversarial Testing of Computer-Use Agents in Hybrid Web-OS EnvironmentsMay 28, 2025 · ICLR 2026 (Oral)
- WebInject: Prompt Injection Attack to Web AgentsMay 16, 2025 · EMNLP 2025 (Poster)
- WASP: Benchmarking Web Agent Security Against Prompt Injection AttacksApril 22, 2025 · NeurIPS 2025 (Poster)
- GUI-R1: A Generalist R1-Style Vision-Language Action Model for GUI AgentsApril 14, 2025 · arXiv
- sudo rm -rf agentic_securityMarch 26, 2025 · ACL 2025 Industry Track
- In-Context Defense in Computer Agents: An Empirical StudyMarch 12, 2025 · arXiv
- Why Are Web AI Agents More Vulnerable Than Standalone LLMs? A Security AnalysisFebruary 27, 2025 · arXiv
- Attacking Vision-Language Computer Agents via Pop-upsNovember 4, 2024 · ACL 2025
- AdvAgent: Controllable Blackbox Red-teaming on Web AgentsOctober 22, 2024 · ICML 2025 (Poster)
- Refusal-Trained LLMs Are Easily Jailbroken As Browser AgentsOctober 11, 2024 · arXiv