Computer-use agents (CUAs) are increasingly capable, yet they frequently take misaligned actions that deviate from user intent. We make the first effort to define and study misaligned action detection in CUAs. We introduce MisActBench, the first benchmark with action-level misalignment labels across CUA trajectories, and DeAction, a practical guardrail that detects and corrects misaligned actions before execution.
Computer-use agents (CUAs) have made tremendous progress in the past year, yet they still frequently produce misaligned actions that deviate from the user's original intent. Such misaligned actions may arise from external attacks (e.g., indirect prompt injection) or internal limitations (e.g., erroneous reasoning). They not only expose CUAs to safety risks, but also degrade task efficiency and reliability.
We present the first systematic study of misaligned action detection in CUAs:
MisActBench is the first benchmark for evaluating action-level misalignment detection in computer-use agents. Unlike existing benchmarks that provide only trajectory-level safety labels, MisActBench offers fine-grained, human-annotated alignment labels for each individual action.
For each action, we provide: (1) the user intent, (2) interaction history (including previous actions and screenshots), (3) current screenshot, and (4) the proposed action. The goal is to predict whether the proposed action is aligned or misaligned with the user's intent before execution.
MisActBench covers all three categories of misaligned actions:
558
Trajectories
2,264
Annotated Actions
0.84
Annotator Agreement (Fleiss' κ)
Action Distribution
1,264
✓ Aligned Actions
1,000
✗ Misaligned Actions
562
Malicious Instruction Following
210
Harmful Unintended Behavior
228
Other Task-Irrelevant Behavior
Coordinates in actions are annotated with red markers on the screenshots for better visualization. Click any screenshot to enlarge.
DeAction is a runtime guardrail that intercepts each proposed action before execution, assesses its alignment with the user's intent, and triggers a correction loop when misalignment is detected. It is plug-and-play for any CUA — requiring only environment state and action proposals, with no access to model internals.
🔍
Two-Stage Detection
Lightweight fast check for obvious cases, systematic analysis for ambiguous ones — balancing accuracy and latency.
🔄
Iterative Correction
Structured feedback steers the agent toward aligned actions, with 78% of flagged actions ultimately corrected.
🔌
Plug-and-Play
Works with any CUA — no model internals needed, just environment state and action proposals.
We evaluate misaligned action detection on MisActBench across multiple backbones. DeAction consistently outperforms prior methods (Task Shield, InferAct), achieving 9–16% absolute F1 improvement.
💡 Takeaway: DeAction achieves the best F1 across all backbones by improving precision without sacrificing recall — reducing false alarms that would otherwise block legitimate actions.
| Method | Precision | Recall | Acc | F1 |
|---|---|---|---|---|
| Qwen3-VL-32B | ||||
| Task Shield | 50.6 | 69.0 | 56.6 | 58.4 |
| InferAct | 47.1 | 89.0 | 51.1 | 61.6 |
| DeAction | 80.1 | 63.3 | 76.9 | 70.7 |
| Claude Sonnet 4.5 | ||||
| Task Shield | 59.5 | 75.8 | 66.5 | 66.6 |
| InferAct | 48.4 | 96.0 | 53.1 | 64.3 |
| DeAction | 88.2 | 74.0 | 84.1 | 80.4 |
| GPT-5.1 Instant | ||||
| Task Shield | 51.4 | 88.8 | 58.0 | 65.1 |
| InferAct | 51.4 | 87.3 | 58.0 | 64.7 |
| DeAction | 73.4 | 86.4 | 80.2 | 79.4 |
| GPT-5.1 Thinking | ||||
| Task Shield | 61.3 | 73.6 | 67.8 | 66.9 |
| InferAct | 56.0 | 70.1 | 62.5 | 62.3 |
| DeAction | 89.9 | 76.8 | 85.9 | 82.8 |
We deploy DeAction with real CUAs on adversarial (RedTeamCUA) and benign (OSWorld) envviroments. DeAction reduces attack success rate by >90% while preserving or even sometimes improving benign task success.
💡 Takeaway: DeAction achieves the lowest ASR across all CUAs while maintaining competitive benign task success, demonstrating practical deployability.
| Agent | Defense | Adversarial (RedTeamCUA) | Benign (OSWorld) | |
|---|---|---|---|---|
| ASR (%) ↓ Attack Success Rate |
UA (%) ↑ Utility under Attack |
SR (%) ↑ Success Rate |
||
| Claude Sonnet 4.5 | No Defense | 60.0 | 44.0 | 42.9 |
| DeAction | 6.0 | 76.0 | 40.7 | |
| OpenAI CUA | No Defense | 42.0 | 82.0 | 26.0 |
| DeAction | 4.0 | 84.0 | 30.7 | |
| OpenCUA-72B | No Defense | 32.0 | 48.0 | 39.0 |
| DSP | 24.0 | 52.0 | 38.3 | |
| PromptArmor | 26.0 | 44.0 | 35.2 | |
| Task Shield | 22.0 | 58.0 | 36.6 | |
| InferAct | 22.0 | 70.0 | 36.5 | |
| DeAction | 2.0 | 60.0 | 39.6 | |
We analyze the runtime behavior of DeAction in online experiments, with statistics averaged across all settings and CUAs.
💡 Takeaway: DeAction accounts for 25% of per-step execution time, with 45% of actions cleared by the fast check. When misalignment is flagged, iterative feedback corrects 78% of cases — recovering aligned behavior rather than simply blocking.
LATENCY
25%
of per-step execution time
7.2s out of 28.1s avg.
TWO-STAGE
45%
actions approved directly with fast check
3.2s avg. latency
CORRECTION
78%
flagged actions corrected
62% in single revision
If you find this work useful, please consider citing our paper:
@misc{ning2026actionsofftaskdetectingcorrecting,
title={When Actions Go Off-Task: Detecting and Correcting Misaligned Actions in Computer-Use Agents},
author={Yuting Ning and Jaylen Jones and Zhehao Zhang and Chentao Ye and Weitong Ruan and Junyi Li and Rahul Gupta and Huan Sun},
year={2026},
eprint={2602.08995},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2602.08995},
}