Evolving in Tasks: Empowering the Multi-modality Large Language Model as the Computer Use Agent

Yuhao Cheng , Liang Tang , Shuxian Li , Yukang Huo , Tiaonan Duan , Kaer Huang , Yanzhe Jing , Yiqiang Yan

🏛 Institutions: Lenovo , China Agricultural University
📅 Date: August 6, 2025
📑 Publisher: arXiv
💻 Env: Desktop Web
🔑 Keywords: Self-Evolution Agent step-wise reinforcement learning grounding-based generalization enhancement temporal compressed sensing OSWorld

TLDR

This paper proposes the Self-Evolution Agent (SEA) for computer use, combining automatic verifiable trajectory generation, efficient step-wise reinforcement learning, and a model-enhancement path that merges grounding and planning ability. It evaluates the resulting agent on grounding benchmarks and OSWorld and frames the method as a way to improve computer-use performance without relying purely on manually curated data.

Open paper arXiv Report issue