Are GUI Agents Focused Enough? Automated Distraction via Semantic-level UI Element Injection
Wenkui Yang, Chao Jin, Haisu Zhu, Weilin Luo, Derek Yuen, Kun Shao, Huaibo Huang, Junxian Duan, Jie Cao, Ran He
- 🏛 Institutions
- UCAS, CASIA, Huawei, ShanghaiTech
- 📅 Date
- April 9, 2026
- 📑 Publisher
- arXiv
- 💻 Env
- General GUI
- 🔑 Keywords
TLDR
This paper proposes Semantic-level UI Element Injection, a red-teaming method that overlays safety-aligned UI elements onto screenshots to misdirect GUI agents' visual grounding. Using a modular Editor-Overlapper-Victim pipeline with iterative search, optimized attacks improve attack success rate by up to 4.4x over random injection and transfer across models.
Related papers
- When Benign Inputs Lead to Severe Harms: Eliciting Unsafe Unintended Behaviors of Computer-Use AgentsFebruary 9, 2026 · arXiv
- AdvAgent: Controllable Blackbox Red-teaming on Web AgentsOctober 22, 2024 · ICML 2025 (Poster)
- Refusal-Trained LLMs Are Easily Jailbroken As Browser AgentsOctober 11, 2024 · arXiv
- CocoaBench: Evaluating Unified Digital Agents in the WildApril 13, 2026 · arXiv
- LPS-Bench: Benchmarking Safety Awareness of Computer-Use Agents in Long-Horizon Planning under Benign and Adversarial ScenariosFebruary 3, 2026 · arXiv
- SafePred: A Predictive Guardrail for Computer-Using Agents via World ModelsFebruary 2, 2026 · arXiv