AppVLM: A Lightweight Vision Language Model for Online App Control

Georgios Papoudakis , Thomas Coste , Zhihao Wu , Jianye Hao , Jun Wang , Kun Shao

🏛 Institutions: Huawei Noah’s Ark Lab , UCL
📅 Date: February 10, 2025
📑 Publisher: arXiv
💻 Env: Mobile
🔑 Keywords: model lightweight VLM offline-to-online training AndroidControl AndroidWorld AppVLM

TLDR

AppVLM is a lightweight vision-language model for mobile app control that is trained first on AndroidControl and then refined with trajectories collected in AndroidWorld. It achieves the best offline action prediction among the compared baselines and matches GPT-4o on online AndroidWorld success rate while running much faster.

Open paper arXiv Report issue