GUI Agents Papers
Star · 821

ScreenAgent: A Vision Language Model-driven Computer Control Agent

Runliang Niu , Jindong Li , Shiqi Wang , Yali Fu , Xiyu Hu , Xueyuan Leng , He Kong , Yi Chang , Qi Wang

🏛 Institutions
Jilin University
📅 Date
February 13, 2024
📑 Publisher
IJCAI 2024
💻 Env
Desktop
🔑 Keywords
TLDR

ScreenAgent builds a real computer-control environment where a vision-language agent interacts with screenshots through mouse and keyboard actions, and pairs it with a planning-acting-reflecting control pipeline. The paper also releases the ScreenAgent Dataset and reports computer-control performance comparable to GPT-4V with more precise UI positioning.

Open paper Report issue
Related papers (24)