STEVE: A Step Verification Pipeline for Computer-use Agent Training

Fanbin Lu , Zhisheng Zhong , Ziqin Wei , Shu Liu , Chi-Wing Fu , Jiaya Jia

🏛 Institutions: CUHK , SmartMore , HKUST
📅 Date: March 16, 2025
📑 Publisher: arXiv
💻 Env: Desktop
🔑 Keywords: dataset model step verification binary stepwise labels KTO STEVE

TLDR

STEVE trains desktop computer-use agents from suboptimal trajectories by verifying each step against before-and-after screenshots instead of relying on expensive gold trajectories. The resulting binary step labels support KTO training of a 7B agent that outperforms supervised fine-tuning on WinAgentArena.

Open paper arXiv Report issue