GUI Agents Papers
Star · 751

CUAAudit: Meta-Evaluation of Vision-Language Models as Auditors of Autonomous Computer-Use Agents

Marta Sumyk, Oleksandr Kosovan

🏛 Institutions
Ukrainian Catholic University
📅 Date
March 11, 2026
📑 Publisher
HEAL @ CHI 2026 Workshop
💻 Env
Desktop
🔑 Keywords
TLDR

CUAAudit studies vision-language models as autonomous judges of desktop-agent task success from observable interactions alone. Across multiple operating-system benchmarks, it finds that even strong VLM auditors degrade on harder environments and disagree substantially with one another, highlighting limits of model-based auditing.

Open paper arXiv Edit on GitHub Report issue
Related papers