GUI Agents Papers
Star · 751

Evolving in Tasks: Empowering the Multi-modality Large Language Model as the Computer Use Agent

Yuhao Cheng, Liang Tang, Shuxian Li, Yukang Huo, Tiaonan Duan, Kaer Huang, Yanzhe Jing, Yiqiang Yan

🏛 Institutions
Lenovo, China Agricultural University
📅 Date
August 6, 2025
📑 Publisher
arXiv
💻 Env
Desktop Web
🔑 Keywords
TLDR

This paper proposes the Self-Evolution Agent (SEA) for computer use, combining automatic verifiable trajectory generation, efficient step-wise reinforcement learning, and a model-enhancement path that merges grounding and planning ability. It evaluates the resulting agent on grounding benchmarks and OSWorld and frames the method as a way to improve computer-use performance without relying purely on manually curated data.

Open paper arXiv Edit on GitHub Report issue
Related papers