GUI Agents Papers
Star · 821

Reliable Weak-to-Strong Monitoring of LLM Agents

Neil Kale , Chen Bo Calvin Zhang , Kevin Zhu , Ankit Aich , Paula Rodriguez , Scale Red Team , Christina Q. Knight , Zifan Wang

🏛 Institutions
Scale AI , Carnegie Mellon University , Massachusetts Institute of Technology
📅 Date
August 26, 2025
📑 Publisher
ICLR 2025
💻 Env
🔑 Keywords
TLDR

Stress-tests LLM agent monitoring systems for detecting covert misbehavior using a monitor red-teaming (MRT) workflow varying agent/monitor awareness and adversarial evasion strategies, evaluated on SHADE-Arena for tool-calling agents and CUA-SHADE-Arena for computer-use agents.

Open paper arXiv Report issue
Related papers (8)