GUI Agents Papers
Star · 821

The Blind Spot of Agent Safety: How Benign User Instructions Expose Critical Vulnerabilities in Computer-Use Agents

Xuwei Ding , Skylar Zhai , Linxin Song , Jiate Li , Taiwei Shi , Nicholas Meade , Siva Reddy , Jian Kang , Jieyu Zhao

🏛 Institutions
USC , McGill , Mila
📅 Date
April 12, 2026
📑 Publisher
arXiv
💻 Env
Desktop
🔑 Keywords
TLDR

OS-BLIND benchmarks computer-use agents under unintended attack scenarios where benign instructions trigger harmful outcomes through environmental context. Most agents exceed 90% attack success rate, and even safety-aligned Claude 4.5 Sonnet reaches 73%. Existing safety defenses activate only initially and fail to re-engage during execution, especially when subtask decomposition obscures harmful intent.

Open paper arXiv Report issue
Related papers (24)