GUI Agents Papers
Star · 821

Dual-Modality Multi-Stage Adversarial Safety Training: Robustifying Multimodal Web Agents Against Cross-Modal Attacks

Haoyu Liu , Dingcheng Li , Lukas Rutishauser , Zeyu Zheng

🏛 Institutions
UC Berkeley , IEOR & BAIR , Google , Google DeepMind
📅 Date
March 4, 2026
📑 Publisher
arXiv
💻 Env
Web
🔑 Keywords
TLDR

DMAST is a three-stage adversarial safety training pipeline for multimodal web agents that jointly reasons over screenshots and accessibility trees. It targets cross-modal DOM injection attacks, substantially reducing adversarial risk while improving efficiency on out-of-distribution MiniWob++ tasks.

Open paper arXiv Report issue
Related papers (24)