OS-Oracle: A Comprehensive Framework for Cross-Platform GUI Critic Models

Zhenyu Wu , Jingjing Xie , Zehao Li , Bowen Yang , Qiushi Sun , Zhaoyang Liu , Zhoumianze Liu , Yu Qiao , Xiangyu Yue , Zun Wang , Zichen Ding

🏛 Institutions: SJTU , Shanghai AI Laboratory , CUHK MMLab , HKU , HKUST
📅 Date: December 18, 2025
📑 Publisher: arXiv
💻 Env: Desktop Mobile Web
🔑 Keywords: critic model benchmark step-level evaluation CP-GRPO OS-Critic Bench OS-Oracle

TLDR

OS-Oracle targets step-level action criticism for computer-use agents with a 310k-sample cross-platform training pipeline, a two-stage SFT plus CP-GRPO recipe, and the OS-Critic Bench benchmark. The resulting 7B critic reaches state of the art among open-source VLM critics and improves downstream GUI agents when used as a pre-critic.

Open paper arXiv Report issue