WebSP-Eval: Evaluating Web Agents on Website Security and Privacy Tasks

Guruprasad Viswanathan Ramesh , Asmit Nayak , Basieem Siddique , Kassem Fawaz

🏛 Institutions: UW-Madison
📅 Date: April 7, 2026
📑 Publisher: arXiv
💻 Env: Web
🔑 Keywords: benchmark security privacy WebSP-Eval

TLDR

WebSP-Eval is the first framework evaluating web agents on user-facing website security and privacy tasks such as cookie preferences, privacy settings, and session revocation. Across 200 task instances on 28 websites, agents fail more than 45% on tasks with stateful UI elements like toggles and checkboxes.

Open paper arXiv Report issue