When Actions Go Off-Task:

Detecting and Correcting Misaligned Actions in Computer-Use Agents


1The Ohio State University 2Amazon AGI

† Correspondence to ning.151@osu.edu, sun.397@osu.edu

Examples of misaligned actions

Computer-use agents (CUAs) are increasingly capable, yet they frequently take misaligned actions that deviate from user intent. We make the first effort to define and study misaligned action detection in CUAs. We introduce MisActBench, the first benchmark with action-level misalignment labels across CUA trajectories, and DeAction, a practical guardrail that detects and corrects misaligned actions before execution.

Overview

Computer-use agents (CUAs) have made tremendous progress in the past year, yet they still frequently produce misaligned actions that deviate from the user's original intent. Such misaligned actions may arise from external attacks (e.g., indirect prompt injection) or internal limitations (e.g., erroneous reasoning). They not only expose CUAs to safety risks, but also degrade task efficiency and reliability.

We present the first systematic study of misaligned action detection in CUAs:

  • Problem Framing: We propose an intent-centric perspective that frames CUA deviations as an action misalignment problem, and identify three common categories of misaligned actions in real-world deployments.
  • MisActBench: A comprehensive benchmark with 2264 human-annotated, action-level alignment labels on diverse CUA trajectories, covering all three categories of misaligned actions.
  • DeAction: A practical and plug-and-play runtime guardrail that proactively detects misaligned actions before execution and corrects them via iterative feedback. Extensive experiments demonstrate its effectiveness in both adversarial and benign settings with moderate overhead.

MisActBench

MisActBench is the first benchmark for evaluating action-level misalignment detection in computer-use agents. Unlike existing benchmarks that provide only trajectory-level safety labels, MisActBench offers fine-grained, human-annotated alignment labels for each individual action.

For each action, we provide: (1) the user intent, (2) interaction history (including previous actions and screenshots), (3) current screenshot, and (4) the proposed action. The goal is to predict whether the proposed action is aligned or misaligned with the user's intent before execution.

MisActBench data collection
We collect trajectories from two sources: (a) running diverse CUAs on adversarial benchmarks to capture attack-induced misalignment; (b) synthesizing trajectories with internally arising misalignment by injecting unintended subgoals into benign tasks.

MisActBench covers all three categories of misaligned actions:

  • Malicious Instruction Following: The action complies with malicious instructions in external environments to achieve an attacker's goal.
  • Harmful Unintended Behavior: The action causes harm inadvertently due to inherent limitations (e.g., reasoning error) rather than adversarial attack.
  • Other Task-Irrelevant Behavior: The action does not cause harmful consequences but is irrelevant to the user task and will degrade efficiency and reliability.

558

Trajectories

2,264

Annotated Actions

0.84

Annotator Agreement (Fleiss' κ)

Action Distribution

1,264

✓ Aligned Actions

1,000

✗ Misaligned Actions

562

Malicious Instruction Following

210

Harmful Unintended Behavior

228

Other Task-Irrelevant Behavior

Data Examples

Coordinates in actions are annotated with red markers on the screenshots for better visualization. Click any screenshot to enlarge.

    DeAction

    DeAction is a runtime guardrail that intercepts each proposed action before execution, assesses its alignment with the user's intent, and triggers a correction loop when misalignment is detected. It is plug-and-play for any CUA — requiring only environment state and action proposals, with no access to model internals.

    🔍

    Two-Stage Detection

    Lightweight fast check for obvious cases, systematic analysis for ambiguous ones — balancing accuracy and latency.

    🔄

    Iterative Correction

    Structured feedback steers the agent toward aligned actions, with 78% of flagged actions ultimately corrected.

    🔌

    Plug-and-Play

    Works with any CUA — no model internals needed, just environment state and action proposals.

    DeAction pipeline
    DeAction pipeline: (1) intercept proposed action, (2) two-stage alignment detection, (3) if misaligned, generate structured feedback and prompt the agent to revise.

    Experiments

    Offline Evaluation

    We evaluate misaligned action detection on MisActBench across multiple backbones. DeAction consistently outperforms prior methods (Task Shield, InferAct), achieving 9–16% absolute F1 improvement.

    💡 Takeaway: DeAction achieves the best F1 across all backbones by improving precision without sacrificing recall — reducing false alarms that would otherwise block legitimate actions.

    Method Precision Recall Acc F1
    Qwen3-VL-32B
    Task Shield 50.6 69.0 56.6 58.4
    InferAct 47.1 89.0 51.1 61.6
    DeAction 80.1 63.3 76.9 70.7
    Claude Sonnet 4.5
    Task Shield 59.5 75.8 66.5 66.6
    InferAct 48.4 96.0 53.1 64.3
    DeAction 88.2 74.0 84.1 80.4
    GPT-5.1 Instant
    Task Shield 51.4 88.8 58.0 65.1
    InferAct 51.4 87.3 58.0 64.7
    DeAction 73.4 86.4 80.2 79.4
    GPT-5.1 Thinking
    Task Shield 61.3 73.6 67.8 66.9
    InferAct 56.0 70.1 62.5 62.3
    DeAction 89.9 76.8 85.9 82.8

    Online Evaluation

    We deploy DeAction with real CUAs on adversarial (RedTeamCUA) and benign (OSWorld) envviroments. DeAction reduces attack success rate by >90% while preserving or even sometimes improving benign task success.

    💡 Takeaway: DeAction achieves the lowest ASR across all CUAs while maintaining competitive benign task success, demonstrating practical deployability.

    Agent Defense Adversarial (RedTeamCUA) Benign (OSWorld)
    ASR (%) ↓
    Attack Success Rate
    UA (%) ↑
    Utility under Attack
    SR (%) ↑
    Success Rate
    Claude Sonnet 4.5 No Defense 60.0 44.0 42.9
    DeAction 6.0 76.0 40.7
    OpenAI CUA No Defense 42.0 82.0 26.0
    DeAction 4.0 84.0 30.7
    OpenCUA-72B No Defense 32.0 48.0 39.0
    DSP 24.0 52.0 38.3
    PromptArmor 26.0 44.0 35.2
    Task Shield 22.0 58.0 36.6
    InferAct 22.0 70.0 36.5
    DeAction 2.0 60.0 39.6

    Runtime Analysis

    We analyze the runtime behavior of DeAction in online experiments, with statistics averaged across all settings and CUAs.

    💡 Takeaway: DeAction accounts for 25% of per-step execution time, with 45% of actions cleared by the fast check. When misalignment is flagged, iterative feedback corrects 78% of cases — recovering aligned behavior rather than simply blocking.

    LATENCY

    25%

    of per-step execution time

    7.2s out of 28.1s avg.

    TWO-STAGE

    45%

    actions approved directly with fast check

    3.2s avg. latency

    CORRECTION

    78%

    flagged actions corrected

    62% in single revision

    BibTeX

    If you find this work useful, please consider citing our paper:

    @misc{ning2026actionsofftaskdetectingcorrecting,
          title={When Actions Go Off-Task: Detecting and Correcting Misaligned Actions in Computer-Use Agents},
          author={Yuting Ning and Jaylen Jones and Zhehao Zhang and Chentao Ye and Weitong Ruan and Junyi Li and Rahul Gupta and Huan Sun},
          year={2026},
          eprint={2602.08995},
          archivePrefix={arXiv},
          primaryClass={cs.CL},
          url={https://arxiv.org/abs/2602.08995},
    }