MirrorGuard: Toward Secure Computer-Use Agents via Simulation-to-Real Reasoning Correction

Wenqi Zhang , Yulin Shen , Changyue Jiang , Jiarun Dai , Geng Hong , Xudong Pan

🏛 Institutions: Fudan , Shanghai Innovation Institute
📅 Date: January 19, 2026
📑 Publisher: arXiv
💻 Env: Desktop
🔑 Keywords: benchmark simulation-to-real reasoning correction MirrorWorld MirrorGuard

TLDR

MirrorGuard is a plug-and-play defense that trains on high-risk trajectories synthesized in a neural-symbolic text simulator called MirrorWorld, then corrects insecure reasoning before real computer-use agents act. Across multiple benchmarks and architectures, it cuts unsafe behavior sharply while preserving utility better than prior defenses.

Open paper arXiv Report issue