GUI Agents Papers
Star · 751

The Blind Spot of Agent Safety: How Benign User Instructions Expose Critical Vulnerabilities in Computer-Use Agents

Xuwei Ding, Skylar Zhai, Linxin Song, Jiate Li, Taiwei Shi, Nicholas Meade, Siva Reddy, Jian Kang, Jieyu Zhao

🏛 Institutions
USC, McGill, Mila
📅 Date
April 12, 2026
📑 Publisher
arXiv
💻 Env
Desktop
🔑 Keywords
TLDR

OS-BLIND benchmarks computer-use agents under unintended attack scenarios where benign instructions trigger harmful outcomes through environmental context. Most agents exceed 90% attack success rate, and even safety-aligned Claude 4.5 Sonnet reaches 73%. Existing safety defenses activate only initially and fail to re-engage during execution, especially when subtask decomposition obscures harmful intent.

Open paper arXiv Edit on GitHub Report issue
Related papers