GUI Agents Papers
Star · 751

DPO Learning with LLMs-Judge Signal for Computer Use Agents

Man Luo, David Cobbley, Xin Su, Shachar Rosenman, Vasudev Lal, Shao-Yen Tseng, Phillip Howard

🏛 Institutions
Intel, Thoughtworks
📅 Date
June 3, 2025
📑 Publisher
arXiv
💻 Env
Desktop
🔑 Keywords
TLDR

This paper targets privacy and compute constraints in computer-use agents by training a lightweight VLM that runs entirely on local machines. It uses an LLM-as-Judge pipeline to score synthetic GUI trajectories and construct DPO preference pairs, then shows that the resulting local agent outperforms baselines on OSWorld.

Open paper arXiv Edit on GitHub Report issue
Related papers