GUI Agents Papers
Star · 751

ShowUI: One Vision-Language-Action Model for GUI Visual Agent

Kevin Qinghong Lin, Linjie Li, Difei Gao, Zhengyuan Yang, Shiwei Wu, Zechen Bai, Weixian Lei, Lijuan Wang, Mike Zheng Shou

🏛 Institutions
Show Lab, NUS, Microsoft
📅 Date
November 26, 2024
📑 Publisher
CVPR 2025 (Poster)
💻 Env
Mobile Web
🔑 Keywords
TLDR

ShowUI is a lightweight vision-language-action model for GUI visual agents that targets efficient screenshot perception and action-history modeling. It introduces UI-guided visual token selection and interleaved vision-language-action streaming, reaching 75.1% zero-shot screenshot grounding while remaining competitive on web and mobile GUI tasks.

Open paper Edit on GitHub Report issue
Related papers