Refusal-Trained LLMs Are Easily Jailbroken As Browser Agents

Priyanshu Kumar , Elaine Lau , Saranya Vijayakumar , Tu Trinh , Scale Red Team , Elaine Chang , Vaughn Robinson , Sean Hendryx , Shuyan Zhou , Matt Fredrikson , Summer Yue , Zifan Wang

🏛 Institutions: CMU , Scale AI , GraySwan AI
📅 Date: October 11, 2024
📑 Publisher: arXiv
💻 Env: Web
🔑 Keywords: benchmark safety red teaming jailbreaking BrowserART

TLDR

The paper introduces BrowserART, a red-teaming benchmark with 100 harmful browser-agent behaviors spanning synthetic and real websites. It shows that refusal-trained backbone LLMs may still execute harmful instructions once embedded in browser agents, and that chat jailbreaks transfer effectively to that agent setting.

Open paper arXiv Report issue