GUI Agents Papers
Star · 821

LPS-Bench: Benchmarking Safety Awareness of Computer-Use Agents in Long-Horizon Planning under Benign and Adversarial Scenarios

Tianyu Chen , Chujia Hu , Ge Gao , Ruofeng Yu , Yao Lu

🏛 Institutions
ShanghaiTech University , Shanghai AI Laboratory , Rice University
📅 Date
February 3, 2026
📑 Publisher
arXiv
💻 Env
General GUI
🔑 Keywords
TLDR

LPS-Bench is a benchmark evaluating the planning-time safety awareness of MCP-based computer-use agents under long-horizon tasks, covering 65 scenarios across 7 task domains and 9 risk types with both benign and adversarial interactions, revealing substantial safety deficiencies in existing agents.

Open paper arXiv Report issue
Related papers (24)