MobileSafetyBench: Evaluating Safety of Autonomous Agents in Mobile Device Control
Juyong Lee, Dongyoon Hahm, June Suk Choi, W. Bradley Knox, Kimin Lee
- 🏛 Institutions
- KAIST, UT Austin
- 📅 Date
- October 23, 2024
- 📑 Publisher
- arXiv
- 💻 Env
- Mobile
- 🔑 Keywords
TLDR
Introduces MobileSafetyBench, a benchmark for measuring safety failures of mobile-control agents in realistic Android tasks involving apps like messaging and banking. It evaluates both ordinary safety behavior and robustness to indirect prompt injection, and shows that current agents still struggle to avoid harmful actions.
Related papers
- It's a TRAP! Task-Redirecting Agent Persuasion Benchmark for Web AgentsDecember 29, 2025 · arXiv
- CORA: Conformal Risk-Controlled Agents for Safeguarded Mobile GUI AutomationApril 10, 2026 · arXiv
- Blind Gods and Broken Screens: Architecting a Secure, Intent-Centric Mobile Agent Operating SystemFebruary 11, 2026 · arXiv
- The Blind Spot of Agent Safety: How Benign User Instructions Expose Critical Vulnerabilities in Computer-Use AgentsApril 12, 2026 · arXiv
- When Benign Inputs Lead to Severe Harms: Eliciting Unsafe Unintended Behaviors of Computer-Use AgentsFebruary 9, 2026 · arXiv
- When Actions Go Off-Task: Detecting and Correcting Misaligned Actions in Computer-Use AgentsFebruary 9, 2026 · arXiv