When Actions Go Off-Task:

Detecting and Correcting Misaligned Actions in Computer-Use Agents

Yuting Ning^†,1, Jaylen Jones¹, Zhehao Zhang¹, Chentao Ye², Weitong Ruan², Junyi Li², Rahul Gupta², Huan Sun^†,1

¹The Ohio State University ²Amazon AGI

† Correspondence to ning.151@osu.edu, sun.397@osu.edu

Computer-use agents (CUAs) are increasingly capable, yet they frequently take misaligned actions that deviate from user intent. We make the first effort to define and study misaligned action detection in CUAs. We introduce MisActBench, the first benchmark with action-level misalignment labels across CUA trajectories, and DeAction, a practical guardrail that detects and corrects misaligned actions before execution.

Overview

Computer-use agents (CUAs) have made tremendous progress in the past year, yet they still frequently produce misaligned actions that deviate from the user's original intent. Such misaligned actions may arise from external attacks (e.g., indirect prompt injection) or internal limitations (e.g., erroneous reasoning). They not only expose CUAs to safety risks, but also degrade task efficiency and reliability.

We present the first systematic study of misaligned action detection in CUAs:

Problem Framing: We propose an intent-centric perspective that frames CUA deviations as an action misalignment problem, and identify three common categories of misaligned actions in real-world deployments.
MisActBench: A comprehensive benchmark with 2264 human-annotated, action-level alignment labels on diverse CUA trajectories, covering all three categories of misaligned actions.
DeAction: A practical and plug-and-play runtime guardrail that proactively detects misaligned actions before execution and corrects them via iterative feedback. Extensive experiments demonstrate its effectiveness in both adversarial and benign settings with moderate overhead.

MisActBench

MisActBench is the first benchmark for evaluating action-level misalignment detection in computer-use agents. Unlike existing benchmarks that provide only trajectory-level safety labels, MisActBench offers fine-grained, human-annotated alignment labels for each individual action.

For each action, we provide: (1) the user intent, (2) interaction history (including previous actions and screenshots), (3) current screenshot, and (4) the proposed action. The goal is to predict whether the proposed action is aligned or misaligned with the user's intent before execution.

MisActBench data collection — We collect trajectories from two sources: (a) running diverse CUAs on adversarial benchmarks to capture attack-induced misalignment; (b) synthesizing trajectories with internally arising misalignment by injecting unintended subgoals into benign tasks.

MisActBench covers all three categories of misaligned actions:

Malicious Instruction Following: The action complies with malicious instructions in external environments to achieve an attacker's goal.
Harmful Unintended Behavior: The action causes harm inadvertently due to inherent limitations (e.g., reasoning error) rather than adversarial attack.
Other Task-Irrelevant Behavior: The action does not cause harmful consequences but is irrelevant to the user task and will degrade efficiency and reliability.

558

Trajectories

2,264

Annotated Actions

0.84

Annotator Agreement (Fleiss' κ)

Action Distribution

1,264

✓ Aligned Actions

1,000

✗ Misaligned Actions

562

Malicious Instruction Following

210

Harmful Unintended Behavior

228

Other Task-Irrelevant Behavior

Data Examples

Coordinates in actions are annotated with red markers on the screenshots for better visualization. Click any screenshot to enlarge.

DeAction

DeAction is a runtime guardrail that intercepts each proposed action before execution, assesses its alignment with the user's intent, and triggers a correction loop when misalignment is detected. It is plug-and-play for any CUA — requiring only environment state and action proposals, with no access to model internals.

🔍

Two-Stage Detection

Lightweight fast check for obvious cases, systematic analysis for ambiguous ones — balancing accuracy and latency.

🔄

Iterative Correction

Structured feedback steers the agent toward aligned actions, with 78% of flagged actions ultimately corrected.

🔌

Plug-and-Play

Works with any CUA — no model internals needed, just environment state and action proposals.

DeAction pipeline: (1) intercept proposed action, (2) two-stage alignment detection, (3) if misaligned, generate structured feedback and prompt the agent to revise.

Experiments

Offline Evaluation

We evaluate misaligned action detection on MisActBench across multiple backbones. DeAction consistently outperforms prior methods (Task Shield, InferAct), achieving 9–16% absolute F1 improvement.

💡 Takeaway: DeAction achieves the best F1 across all backbones by improving precision without sacrificing recall — reducing false alarms that would otherwise block legitimate actions.

Method	Precision	Recall	Acc	F1
Qwen3-VL-32B
Task Shield	50.6	69.0	56.6	58.4
InferAct	47.1	89.0	51.1	61.6
DeAction	80.1	63.3	76.9	70.7
Claude Sonnet 4.5
Task Shield	59.5	75.8	66.5	66.6
InferAct	48.4	96.0	53.1	64.3
DeAction	88.2	74.0	84.1	80.4
GPT-5.1 Instant
Task Shield	51.4	88.8	58.0	65.1
InferAct	51.4	87.3	58.0	64.7
DeAction	73.4	86.4	80.2	79.4
GPT-5.1 Thinking
Task Shield	61.3	73.6	67.8	66.9
InferAct	56.0	70.1	62.5	62.3
DeAction	89.9	76.8	85.9	82.8

Online Evaluation

We deploy DeAction with real CUAs on adversarial (RedTeamCUA) and benign (OSWorld) envviroments. DeAction reduces attack success rate by >90% while preserving or even sometimes improving benign task success.

💡 Takeaway: DeAction achieves the lowest ASR across all CUAs while maintaining competitive benign task success, demonstrating practical deployability.

Agent	Defense	Adversarial (RedTeamCUA)		Benign (OSWorld)
Agent	Defense	ASR (%) ↓ Attack Success Rate	UA (%) ↑ Utility under Attack	SR (%) ↑ Success Rate
Claude Sonnet 4.5	No Defense	60.0	44.0	42.9
Claude Sonnet 4.5	DeAction	6.0	76.0	40.7
OpenAI CUA	No Defense	42.0	82.0	26.0
OpenAI CUA	DeAction	4.0	84.0	30.7
OpenCUA-72B	No Defense	32.0	48.0	39.0
	DSP	24.0	52.0	38.3
	PromptArmor	26.0	44.0	35.2
	Task Shield	22.0	58.0	36.6
	InferAct	22.0	70.0	36.5
	DeAction	2.0	60.0	39.6

Runtime Analysis

We analyze the runtime behavior of DeAction in online experiments, with statistics averaged across all settings and CUAs.

💡 Takeaway: DeAction accounts for 25% of per-step execution time, with 45% of actions cleared by the fast check. When misalignment is flagged, iterative feedback corrects 78% of cases — recovering aligned behavior rather than simply blocking.

LATENCY

25%

of per-step execution time

7.2s out of 28.1s avg.

TWO-STAGE

45%

actions approved directly with fast check

3.2s avg. latency

CORRECTION

78%

flagged actions corrected

62% in single revision

BibTeX

If you find this work useful, please consider citing our paper:

@misc{ning2026actionsofftaskdetectingcorrecting,
      title={When Actions Go Off-Task: Detecting and Correcting Misaligned Actions in Computer-Use Agents},
      author={Yuting Ning and Jaylen Jones and Zhehao Zhang and Chentao Ye and Weitong Ruan and Junyi Li and Rahul Gupta and Huan Sun},
      year={2026},
      eprint={2602.08995},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2602.08995},
}