AdvAgent: Controllable Blackbox Red-teaming on Web Agents
Chejian Xu , Mintong Kang , Jiawei Zhang , Zeyi Liao , Lingbo Mo , Mengqi Yuan , Huan Sun , Bo Li
- 🏛 Institutions
- UIUC , University of Chicago , OSU
- 📅 Date
- October 22, 2024
- 📑 Publisher
- ICML 2025 (Poster)
- 💻 Env
- Web
- 🔑 Keywords
TLDR
AdvAgent is a black-box red-teaming method for web agents that trains an adversarial prompter with DPO to generate stealthy, controllable attacks against frontier browser agents. The paper shows high attack success rates across realistic web tasks and finds that existing prompt-based defenses provide limited protection.
Related papers (24)
- Refusal-Trained LLMs Are Easily Jailbroken As Browser AgentsOctober 11, 2024 · arXiv
- Are GUI Agents Focused Enough? Automated Distraction via Semantic-level UI Element InjectionApril 9, 2026 · arXiv
- When Benign Inputs Lead to Severe Harms: Eliciting Unsafe Unintended Behaviors of Computer-Use AgentsFebruary 9, 2026 · arXiv
- Dual-Modality Multi-Stage Adversarial Safety Training: Robustifying Multimodal Web Agents Against Cross-Modal AttacksMarch 4, 2026 · arXiv
- WebTrap Park: An Automated Platform for Systematic Security Evaluation of Web AgentsJanuary 13, 2026 · arXiv
- It's a TRAP! Task-Redirecting Agent Persuasion Benchmark for Web AgentsDecember 29, 2025 · arXiv
- DECEPTICON: How Dark Patterns Manipulate Web AgentsDecember 28, 2025 · arXiv
- Permission Manifests for Web AgentsDecember 7, 2025 · arXiv
- Genesis: Evolving Attack Strategies for LLM Web Agent Red-TeamingOctober 21, 2025 · ICME 2026
- Investigating the Impact of Dark Patterns on LLM-Based Web AgentsOctober 20, 2025 · IEEE S&P 2026
- Cross-Modal Content Optimization for Steering Web Agent PreferencesOctober 4, 2025 · arXiv
- RiOSWorld: Benchmarking the Risk of Multimodal Computer-Use AgentsMay 31, 2025 · NeurIPS 2025 (Poster)
- Attacking Vision-Language Computer Agents via Pop-upsNovember 4, 2024 · ACL 2025
- ST-WebAgentBench: A Benchmark for Evaluating Safety and Trustworthiness in Web AgentsOctober 9, 2024 · ICLR 2026 (Poster)
- EIA: Environmental Injection Attack on Generalist Web Agents for Privacy LeakageSeptember 17, 2024 · ICLR 2025 (Poster)
- Dissecting Adversarial Robustness of Multimodal LM AgentsJune 18, 2024 · ICLR 2025 (Poster)
- Human-Guided Harm Recovery for Computer Use AgentsApril 20, 2026 · arXiv
- The Blind Spot of Agent Safety: How Benign User Instructions Expose Critical Vulnerabilities in Computer-Use AgentsApril 12, 2026 · arXiv
- CORA: Conformal Risk-Controlled Agents for Safeguarded Mobile GUI AutomationApril 10, 2026 · arXiv
- Preference Redirection via Attention Concentration: An Attack on Computer Use AgentsApril 9, 2026 · arXiv
- When Actions Go Off-Task: Detecting and Correcting Misaligned Actions in Computer-Use AgentsFebruary 9, 2026 · arXiv
- LPS-Bench: Benchmarking Safety Awareness of Computer-Use Agents in Long-Horizon Planning under Benign and Adversarial ScenariosFebruary 3, 2026 · arXiv
- SafePred: A Predictive Guardrail for Computer-Using Agents via World ModelsFebruary 2, 2026 · arXiv
- CaMeLs Can Use Computers Too: System-level Security for Computer Use AgentsJanuary 14, 2026 · arXiv