When Actions Go Off-Task: Detecting and Correcting Misaligned Actions in Computer-Use Agents

Yuting Ning , Jaylen Jones , Zhehao Zhang , Chentao Ye , Weitong Ruan , Junyi Li , Rahul Gupta , Huan Sun

🏛 Institutions: OSU , Amazon AGI
📅 Date: February 9, 2026
📑 Publisher: arXiv
💻 Env: Desktop
🔑 Keywords: safety guardrail misaligned actions benchmark dataset MisActBench DeAction

TLDR

This paper introduces MisActBench, a benchmark of 2,264 human-annotated action-level alignment labels covering malicious instruction following, harmful unintended behavior, and task-irrelevant actions. It proposes DeAction, a two-stage guardrail that detects misaligned actions before execution and iteratively corrects them, improving F1 by 15%+ over baselines and reducing attack success rate by over 90%.

Open paper arXiv Report issue