When Actions Go Off-Task: Detecting and Correcting Misaligned Actions in Computer-Use Agents
Yuting Ning, Jaylen Jones, Zhehao Zhang, Chentao Ye, Weitong Ruan, Junyi Li, Rahul Gupta, Huan Sun
- 🏛 Institutions
- OSU, Amazon AGI
- 📅 Date
- February 9, 2026
- 📑 Publisher
- arXiv
- 💻 Env
- Desktop
- 🔑 Keywords
TLDR
This paper introduces MisActBench, a benchmark of 2,264 human-annotated action-level alignment labels covering malicious instruction following, harmful unintended behavior, and task-irrelevant actions. It proposes DeAction, a two-stage guardrail that detects misaligned actions before execution and iteratively corrects them, improving F1 by 15%+ over baselines and reducing attack success rate by over 90%.
Related papers
- The Blind Spot of Agent Safety: How Benign User Instructions Expose Critical Vulnerabilities in Computer-Use AgentsApril 12, 2026 · arXiv
- Gym-Anything: Turn any Software into an Agent EnvironmentApril 7, 2026 · arXiv
- When Benign Inputs Lead to Severe Harms: Eliciting Unsafe Unintended Behaviors of Computer-Use AgentsFebruary 9, 2026 · arXiv
- ShowUI-π: Flow-based Generative Models as GUI Dexterous HandsDecember 31, 2025 · arXiv
- GUI-360: A Comprehensive Dataset and Benchmark for Computer-Using AgentsNovember 6, 2025 · arXiv
- NaturalGAIA: Pushing the Frontiers of GUI Agents with a Challenging Benchmark and High-Quality Trajectory DatasetAugust 2, 2025 · arXiv