GUI Agents Papers
Star · 751

MobileSafetyBench: Evaluating Safety of Autonomous Agents in Mobile Device Control

Juyong Lee, Dongyoon Hahm, June Suk Choi, W. Bradley Knox, Kimin Lee

🏛 Institutions
KAIST, UT Austin
📅 Date
October 23, 2024
📑 Publisher
arXiv
💻 Env
Mobile
🔑 Keywords
TLDR

Introduces MobileSafetyBench, a benchmark for measuring safety failures of mobile-control agents in realistic Android tasks involving apps like messaging and banking. It evaluates both ordinary safety behavior and robustness to indirect prompt injection, and shows that current agents still struggle to avoid harmful actions.

Open paper arXiv Edit on GitHub Report issue
Related papers