WebSP-Eval: Evaluating Web Agents on Website Security and Privacy Tasks
Guruprasad Viswanathan Ramesh, Asmit Nayak, Basieem Siddique, Kassem Fawaz
- 🏛 Institutions
- UW-Madison
- 📅 Date
- April 7, 2026
- 📑 Publisher
- arXiv
- 💻 Env
- Web
- 🔑 Keywords
TLDR
WebSP-Eval is the first framework evaluating web agents on user-facing website security and privacy tasks such as cookie preferences, privacy settings, and session revocation. Across 200 task instances on 28 websites, agents fail more than 45% on tasks with stateful UI elements like toggles and checkboxes.
Related papers
- WebPII: Benchmarking Visual PII Detection for Computer-Use AgentsMarch 18, 2026 · arXiv
- HackWorld: Evaluating Computer-Use Agents on Exploiting Web Application VulnerabilitiesOctober 14, 2025 · ICLR 2026 (Poster)
- RedTeamCUA: Realistic Adversarial Testing of Computer-Use Agents in Hybrid Web-OS EnvironmentsMay 28, 2025 · ICLR 2026 (Oral)
- WASP: Benchmarking Web Agent Security Against Prompt Injection AttacksApril 22, 2025 · NeurIPS 2025 (Poster)
- AgentDAM: Privacy Leakage Evaluation for Autonomous Web AgentsMarch 12, 2025 · NeurIPS 2025 Datasets and Benchmarks Track (Poster)
- The Blind Spot of Agent Safety: How Benign User Instructions Expose Critical Vulnerabilities in Computer-Use AgentsApril 12, 2026 · arXiv