GUI Agents Papers
Star · 751

Refusal-Trained LLMs Are Easily Jailbroken As Browser Agents

Priyanshu Kumar, Elaine Lau, Saranya Vijayakumar, Tu Trinh, Scale Red Team, Elaine Chang, Vaughn Robinson, Sean Hendryx, Shuyan Zhou, Matt Fredrikson, Summer Yue, Zifan Wang

🏛 Institutions
CMU, Scale AI, GraySwan AI
📅 Date
October 11, 2024
📑 Publisher
arXiv
💻 Env
Web
🔑 Keywords
TLDR

The paper introduces BrowserART, a red-teaming benchmark with 100 harmful browser-agent behaviors spanning synthetic and real websites. It shows that refusal-trained backbone LLMs may still execute harmful instructions once embedded in browser agents, and that chat jailbreaks transfer effectively to that agent setting.

Open paper arXiv Edit on GitHub Report issue
Related papers