Are GUI Agents Focused Enough? Automated Distraction via Semantic-level UI Element Injection

Wenkui Yang , Chao Jin , Haisu Zhu , Weilin Luo , Derek Yuen , Kun Shao , Huaibo Huang , Junxian Duan , Jie Cao , Ran He

🏛 Institutions: UCAS , CASIA , Huawei , ShanghaiTech
📅 Date: April 9, 2026
📑 Publisher: arXiv
💻 Env: General GUI
🔑 Keywords: safety red teaming visual grounding UI injection

TLDR

This paper proposes Semantic-level UI Element Injection, a red-teaming method that overlays safety-aligned UI elements onto screenshots to misdirect GUI agents' visual grounding. Using a modular Editor-Overlapper-Victim pipeline with iterative search, optimized attacks improve attack success rate by up to 4.4x over random injection and transfer across models.

Open paper arXiv Report issue