GUI Agents Papers
Star · 821

ShowUI: One Vision-Language-Action Model for GUI Visual Agent

Kevin Qinghong Lin , Linjie Li , Difei Gao , Zhengyuan Yang , Shiwei Wu , Zechen Bai , Weixian Lei , Lijuan Wang , Mike Zheng Shou

🏛 Institutions
Show Lab , NUS , Microsoft
📅 Date
November 26, 2024
📑 Publisher
CVPR 2025 (Poster)
💻 Env
Mobile Web
🔑 Keywords
TLDR

ShowUI is a lightweight vision-language-action model for GUI visual agents that targets efficient screenshot perception and action-history modeling. It introduces UI-guided visual token selection and interleaved vision-language-action streaming, reaching 75.1% zero-shot screenshot grounding while remaining competitive on web and mobile GUI tasks.

Open paper arXiv Report issue
Related papers (24)