GUI Agents Papers
Star · 751

Reliable Weak-to-Strong Monitoring of LLM Agents

Neil Kale, Chen Bo Calvin Zhang, Kevin Zhu, Ankit Aich, Paula Rodriguez, Scale Red Team, Christina Q. Knight, Zifan Wang

🏛 Institutions
Scale AI, Carnegie Mellon University, Massachusetts Institute of Technology
📅 Date
August 26, 2025
📑 Publisher
ICLR 2025
💻 Env
🔑 Keywords
TLDR

Stress-tests LLM agent monitoring systems for detecting covert misbehavior using a monitor red-teaming (MRT) workflow varying agent/monitor awareness and adversarial evasion strategies, evaluated on SHADE-Arena for tool-calling agents and CUA-SHADE-Arena for computer-use agents.

Open paper arXiv Edit on GitHub Report issue
Related papers