AdvAgent: Controllable Blackbox Red-teaming on Web Agents
Chejian Xu, Mintong Kang, Jiawei Zhang, Zeyi Liao, Lingbo Mo, Mengqi Yuan, Huan Sun, Bo Li
- 🏛 Institutions
- UIUC, University of Chicago, OSU
- 📅 Date
- October 22, 2024
- 📑 Publisher
- ICML 2025 (Poster)
- 💻 Env
- Web
- 🔑 Keywords
TLDR
AdvAgent is a black-box red-teaming method for web agents that trains an adversarial prompter with DPO to generate stealthy, controllable attacks against frontier browser agents. The paper shows high attack success rates across realistic web tasks and finds that existing prompt-based defenses provide limited protection.
Related papers
- Refusal-Trained LLMs Are Easily Jailbroken As Browser AgentsOctober 11, 2024 · arXiv
- Are GUI Agents Focused Enough? Automated Distraction via Semantic-level UI Element InjectionApril 9, 2026 · arXiv
- When Benign Inputs Lead to Severe Harms: Eliciting Unsafe Unintended Behaviors of Computer-Use AgentsFebruary 9, 2026 · arXiv
- Dual-Modality Multi-Stage Adversarial Safety Training: Robustifying Multimodal Web Agents Against Cross-Modal AttacksMarch 4, 2026 · arXiv
- WebTrap Park: An Automated Platform for Systematic Security Evaluation of Web AgentsJanuary 13, 2026 · arXiv
- It's a TRAP! Task-Redirecting Agent Persuasion Benchmark for Web AgentsDecember 29, 2025 · arXiv