MobileSafetyBench: Evaluating Safety of Autonomous Agents in Mobile Device Control

Juyong Lee , Dongyoon Hahm , June Suk Choi , W. Bradley Knox , Kimin Lee

🏛 Institutions: KAIST , UT Austin
📅 Date: October 23, 2024
📑 Publisher: arXiv
💻 Env: Mobile
🔑 Keywords: benchmark safety prompt injection Android emulator MobileSafetyBench

TLDR

Introduces MobileSafetyBench, a benchmark for measuring safety failures of mobile-control agents in realistic Android tasks involving apps like messaging and banking. It evaluates both ordinary safety behavior and robustness to indirect prompt injection, and shows that current agents still struggle to avoid harmful actions.

Open paper arXiv Report issue