GUI Agents Papers
Star · 751

CogAgent: A Visual Language Model for GUI Agents

Wenyi Hong, Weihan Wang, Qingsong Lv, Jiazheng Xu, Wenmeng Yu, Junhui Ji, Yan Wang, Zihan Wang, Yuxiao Dong, Ming Ding, Jie Tang

🏛 Institutions
Tsinghua, Zhipu
📅 Date
December 14, 2023
📑 Publisher
CVPR 2024 (Highlight)
💻 Env
Mobile Web
🔑 Keywords
TLDR

CogAgent is an 18B visual language model specialized for GUI understanding and navigation. It combines low- and high-resolution image encoders, trains on a large GUI-and-OCR dataset, and outperforms HTML-consuming baselines on Mind2Web and AITW using screenshots alone.

Open paper Edit on GitHub Report issue
Related papers