WebSentinel: Detecting and Localizing Prompt Injection Attacks for Web Agents
Xilong Wang, Yinuo Liu, Zhun Wang, Dawn Song, Neil Gong
- 🏛 Institutions
- Duke University, UC Berkeley
- 📅 Date
- February 3, 2026
- 📑 Publisher
- arXiv
- 💻 Env
- Web
- 🔑 Keywords
TLDR
WebSentinel is a two-step defense framework that detects and localizes prompt injection attacks in webpages by first extracting segments of interest that may be contaminated and then evaluating each segment's consistency with the webpage content, substantially outperforming baseline methods on multiple datasets.
Related papers
- Dual-Modality Multi-Stage Adversarial Safety Training: Robustifying Multimodal Web Agents Against Cross-Modal AttacksMarch 4, 2026 · arXiv
- WebInject: Prompt Injection Attack to Web AgentsMay 16, 2025 · EMNLP 2025 (Poster)
- WASP: Benchmarking Web Agent Security Against Prompt Injection AttacksApril 22, 2025 · NeurIPS 2025 (Poster)
- WebSP-Eval: Evaluating Web Agents on Website Security and Privacy TasksApril 7, 2026 · arXiv
- It's a TRAP! Task-Redirecting Agent Persuasion Benchmark for Web AgentsDecember 29, 2025 · arXiv
- Genesis: Evolving Attack Strategies for LLM Web Agent Red-TeamingOctober 21, 2025 · ICME 2026