SusBench: An Online Benchmark for Evaluating Dark Pattern Susceptibility of Computer-Use Agents
Longjie Guo , Chenjie Yuan , Mingyuan Zhong , Robert Wolfe , Ruican Zhong , Yue Xu , Bingbing Wen , Hua Shen , Lucy Lu Wang , Alexis Hiniker
- 🏛 Institutions
- University of Washington , Rutgers University , CMU , New York University Shanghai
- 📅 Date
- October 13, 2025
- 📑 Publisher
- IUI 2026
- 💻 Env
- Web
- 🔑 Keywords
TLDR
SusBench injects nine dark pattern types into 55 real consumer websites and evaluates five computer-use agents alongside 29 human participants across 313 tasks. It finds both agents and humans especially susceptible to preselection, trick wording, and hidden information, while other overt dark patterns are easier to resist.
Related papers (24)
- DECEPTICON: How Dark Patterns Manipulate Web AgentsDecember 28, 2025 · arXiv
- Investigating the Impact of Dark Patterns on LLM-Based Web AgentsOctober 20, 2025 · IEEE S&P 2026
- Odysseys: Benchmarking Web Agents on Realistic Long Horizon TasksApril 27, 2026 · arXiv
- WebForge: Breaking the Realism-Reproducibility-Scalability Trilemma in Browser Agent BenchmarkApril 13, 2026 · arXiv
- The Amazing Agent Race: Strong Tool Users, Weak NavigatorsApril 11, 2026 · arXiv
- ClawBench: Can AI Agents Complete Everyday Online Tasks?April 9, 2026 · arXiv
- GameWorld: Towards Standardized and Verifiable Evaluation of Multimodal Game AgentsApril 8, 2026 · arXiv
- WebSP-Eval: Evaluating Web Agents on Website Security and Privacy TasksApril 7, 2026 · arXiv
- The Art of Building Verifiers for Computer Use AgentsApril 5, 2026 · arXiv
- When Users Change Their Mind: Evaluating Interruptible Agents in Long-Horizon Web NavigationApril 1, 2026 · arXiv
- WebArena-Infinity: Generating Browser Environments with Verifiable Tasks at ScaleMarch 2026 · Blog Post
- Vision2Web: A Hierarchical Benchmark for Visual Website Development with Agent VerificationMarch 27, 2026 · arXiv
- WebTestBench: Evaluating Computer-Use Agents towards End-to-End Automated Web TestingMarch 26, 2026 · arXiv
- Ego2Web: A Web Agent Benchmark Grounded in Egocentric VideosMarch 23, 2026 · CVPR 2026
- WebPII: Benchmarking Visual PII Detection for Computer-Use AgentsMarch 18, 2026 · arXiv
- WebChain: A Large-Scale Human-Annotated Dataset of Real-World Web Interaction TracesMarch 5, 2026 · arXiv
- TimeWarp: Evaluating Web Agents by Revisiting the PastMarch 5, 2026 · arXiv
- Persona2Web: Benchmarking Personalized Web Agents for Contextual Reasoning with User HistoryFebruary 19, 2026 · arXiv
- PATHWAYS: Evaluating Investigation and Context Discovery in AI Web AgentsFebruary 5, 2026 · arXiv
- WebTrap Park: An Automated Platform for Systematic Security Evaluation of Web AgentsJanuary 13, 2026 · arXiv
- WebGym: Scaling Training Environments for Visual Web Agents with Realistic TasksJanuary 5, 2026 · arXiv
- It's a TRAP! Task-Redirecting Agent Persuasion Benchmark for Web AgentsDecember 29, 2025 · arXiv
- VenusBench-GD: A Comprehensive Multi-Platform GUI Benchmark for Diverse Grounding TasksDecember 18, 2025 · arXiv
- OS-Oracle: A Comprehensive Framework for Cross-Platform GUI Critic ModelsDecember 18, 2025 · arXiv