Human-Guided Harm Recovery for Computer Use Agents

Christy Li , Sky CH-Wang , Andi Peng , Andreea Bobu

🏛 Institutions: Unknown
📅 Date: April 20, 2026
📑 Publisher: arXiv
💻 Env: General GUI
🔑 Keywords: safety harm recovery human preferences recovery planning

TLDR

This paper formalizes harm recovery for computer-use agents: steering an agent from a harmful post-execution state back to a safe one aligned with human preferences. It builds a user-study-derived rubric, collects pairwise recovery judgments, and uses a reward model to rank recovery plans.

Open paper arXiv Report issue

Related papers (24)

Are GUI Agents Focused Enough? Automated Distraction via Semantic-level UI Element Injection

April 9, 2026 · arXiv
LPS-Bench: Benchmarking Safety Awareness of Computer-Use Agents in Long-Horizon Planning under Benign and Adversarial Scenarios

February 3, 2026 · arXiv
SafePred: A Predictive Guardrail for Computer-Using Agents via World Models

February 2, 2026 · arXiv
GEM: Gaussian Embedding Modeling for Out-of-Distribution Detection in GUI Agents

May 19, 2025 · arXiv
A Survey on the Safety and Security Threats of Computer-Using Agents: JARVIS or Ultron?

May 16, 2025 · arXiv
OS Agents: A Survey on MLLM-based Agents for Computer, Phone and Browser Use

December 20, 2024 · ACL 2025
The Blind Spot of Agent Safety: How Benign User Instructions Expose Critical Vulnerabilities in Computer-Use Agents

April 12, 2026 · arXiv
CORA: Conformal Risk-Controlled Agents for Safeguarded Mobile GUI Automation

April 10, 2026 · arXiv
Preference Redirection via Attention Concentration: An Attack on Computer Use Agents

April 9, 2026 · arXiv
Dual-Modality Multi-Stage Adversarial Safety Training: Robustifying Multimodal Web Agents Against Cross-Modal Attacks

March 4, 2026 · arXiv
When Benign Inputs Lead to Severe Harms: Eliciting Unsafe Unintended Behaviors of Computer-Use Agents

February 9, 2026 · arXiv
When Actions Go Off-Task: Detecting and Correcting Misaligned Actions in Computer-Use Agents

February 9, 2026 · arXiv
CaMeLs Can Use Computers Too: System-level Security for Computer Use Agents

January 14, 2026 · arXiv
WebTrap Park: An Automated Platform for Systematic Security Evaluation of Web Agents

January 13, 2026 · arXiv
It's a TRAP! Task-Redirecting Agent Persuasion Benchmark for Web Agents

December 29, 2025 · arXiv
DECEPTICON: How Dark Patterns Manipulate Web Agents

December 28, 2025 · arXiv
Permission Manifests for Web Agents

December 7, 2025 · arXiv
Genesis: Evolving Attack Strategies for LLM Web Agent Red-Teaming

October 21, 2025 · ICME 2026
Investigating the Impact of Dark Patterns on LLM-Based Web Agents

October 20, 2025 · IEEE S&P 2026
macOSWorld: A Multilingual Interactive Benchmark for GUI Agents

June 4, 2025 · NeurIPS 2025 (Poster)
RiOSWorld: Benchmarking the Risk of Multimodal Computer-Use Agents

May 31, 2025 · NeurIPS 2025 (Poster)
VeriSafe Agent: Safeguarding Mobile GUI Agent via Logic-based Action Verification

March 24, 2025 · MobiCom 2025
Attacking Vision-Language Computer Agents via Pop-ups

November 4, 2024 · ACL 2025
MobileSafetyBench: Evaluating Safety of Autonomous Agents in Mobile Device Control

October 23, 2024 · arXiv