GUI Agents Papers
Star · 751

When Benign Inputs Lead to Severe Harms: Eliciting Unsafe Unintended Behaviors of Computer-Use Agents

Jaylen Jones, Zhehao Zhang, Yuting Ning, Eric Fosler-Lussier, Pierre-Luc St-Charles, Yoshua Bengio, Dawn Song, Yu Su, Huan Sun

🏛 Institutions
OSU, LawZero, Mila, UdeM, UC Berkeley
📅 Date
February 9, 2026
📑 Publisher
arXiv
💻 Env
Desktop
🔑 Keywords
TLDR

AutoElicit is an agentic framework that iteratively perturbs benign instructions using CUA execution feedback to surface unintended harmful behaviors while keeping inputs realistic. It elicits severe harms from frontier CUAs like Claude 4.5 and Operator in up to 72.5% of OS-domain seeds, and evaluates cross-model transferability of verified perturbations.

Open paper arXiv Edit on GitHub Report issue
Related papers