GUI Agents Papers
Star · 821

AppVLM: A Lightweight Vision Language Model for Online App Control

Georgios Papoudakis , Thomas Coste , Zhihao Wu , Jianye Hao , Jun Wang , Kun Shao

🏛 Institutions
Huawei Noah’s Ark Lab , UCL
📅 Date
February 10, 2025
📑 Publisher
arXiv
💻 Env
Mobile
🔑 Keywords
TLDR

AppVLM is a lightweight vision-language model for mobile app control that is trained first on AndroidControl and then refined with trajectories collected in AndroidWorld. It achieves the best offline action prediction among the compared baselines and matches GPT-4o on online AndroidWorld success rate while running much faster.

Open paper arXiv Report issue
Related papers (24)