GUI Agents Papers
Star · 751

Dual-Modality Multi-Stage Adversarial Safety Training: Robustifying Multimodal Web Agents Against Cross-Modal Attacks

Haoyu Liu, Dingcheng Li, Lukas Rutishauser, Zeyu Zheng

🏛 Institutions
UC Berkeley, IEOR & BAIR, Google, Google DeepMind
📅 Date
March 4, 2026
📑 Publisher
arXiv
💻 Env
Web
🔑 Keywords
TLDR

DMAST is a three-stage adversarial safety training pipeline for multimodal web agents that jointly reasons over screenshots and accessibility trees. It targets cross-modal DOM injection attacks, substantially reducing adversarial risk while improving efficiency on out-of-distribution MiniWob++ tasks.

Open paper arXiv Edit on GitHub Report issue
Related papers