GUI Agents Papers
Star · 821

ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Gilles Baechler , Srinivas Sunkara , Maria Wang , Fedir Zubach , Hassan Mansoor , Vincent Etter , Victor Cărbune , Jason Lin , Jindong Chen , Abhanshu Sharma

🏛 Institutions
Google Research
📅 Date
February 7, 2024
📑 Publisher
IJCAI 2024
💻 Env
General GUI
🔑 Keywords
TLDR

ScreenAI is a vision-language model for UI and infographics understanding that combines a PaLI-style architecture with pix2struct-style flexible patching. It introduces a screen-annotation task, uses it to generate large-scale UI training data, and releases three datasets for screen annotation and screen question answering.

Open paper Report issue
Related papers (24)