You Told Me to Do It: Measuring Instructional Text-induced Private Data Leakage in LLM Agents
Ching-Yu Kao, Xinfeng Li, Shenyu Dai, Tianze Qiu, Pengcheng Zhou, Eric Hanchen Jiang, Philip Sperl
- 🏛 Institutions
- Fraunhofer AISEC, NTU, KTH, NUS, UCLA
- 📅 Date
- March 12, 2026
- 📑 Publisher
- arXiv
- 💻 Env
- 🔑 Keywords
TLDR
This paper studies documentation-embedded instruction injection in high-privilege LLM agents and frames the failure mode as the Trusted Executor Dilemma. It introduces ReadSecBench, shows exfiltration success up to 85%, and finds that both rule-based and LLM-based defenses still fail to catch the attacks reliably.
Related papers
- WebSP-Eval: Evaluating Web Agents on Website Security and Privacy TasksApril 7, 2026 · arXiv
- The Blind Spot of Agent Safety: How Benign User Instructions Expose Critical Vulnerabilities in Computer-Use AgentsApril 12, 2026 · arXiv
- Preference Redirection via Attention Concentration: An Attack on Computer Use AgentsApril 9, 2026 · arXiv
- AgentRAE: Remote Action Execution through Notification-based Visual Backdoors against Screenshots-based Mobile GUI AgentsMarch 24, 2026 · arXiv
- WebPII: Benchmarking Visual PII Detection for Computer-Use AgentsMarch 18, 2026 · arXiv
- Visual Confused Deputy: Exploiting and Defending Perception Failures in Computer-Using AgentsMarch 16, 2026 · arXiv