GUI Agents Papers
Star · 821

CogAgent: A Visual Language Model for GUI Agents

Wenyi Hong , Weihan Wang , Qingsong Lv , Jiazheng Xu , Wenmeng Yu , Junhui Ji , Yan Wang , Zihan Wang , Yuxiao Dong , Ming Ding , Jie Tang

🏛 Institutions
Tsinghua , Zhipu
📅 Date
December 14, 2023
📑 Publisher
CVPR 2024 (Highlight)
💻 Env
Mobile Web
🔑 Keywords
TLDR

CogAgent is an 18B visual language model specialized for GUI understanding and navigation. It combines low- and high-resolution image encoders, trains on a large GUI-and-OCR dataset, and outperforms HTML-consuming baselines on Mind2Web and AITW using screenshots alone.

Open paper arXiv Report issue
Related papers (24)