The Blind Spot of Agent Safety: How Benign User Instructions Expose Critical Vulnerabilities in Computer-Use Agents

Xuwei Ding , Skylar Zhai , Linxin Song , Jiate Li , Taiwei Shi , Nicholas Meade , Siva Reddy , Jian Kang , Jieyu Zhao

🏛 Institutions: USC , McGill , Mila
📅 Date: April 12, 2026
📑 Publisher: arXiv
💻 Env: Desktop
🔑 Keywords: benchmark safety security unintended attacks OS-BLIND

TLDR

OS-BLIND benchmarks computer-use agents under unintended attack scenarios where benign instructions trigger harmful outcomes through environmental context. Most agents exceed 90% attack success rate, and even safety-aligned Claude 4.5 Sonnet reaches 73%. Existing safety defenses activate only initially and fail to re-engage during execution, especially when subtask decomposition obscures harmful intent.

Open paper arXiv Report issue