AppVLM: A Lightweight Vision Language Model for Online App Control
Georgios Papoudakis , Thomas Coste , Zhihao Wu , Jianye Hao , Jun Wang , Kun Shao
- 🏛 Institutions
- Huawei Noah’s Ark Lab , UCL
- 📅 Date
- February 10, 2025
- 📑 Publisher
- arXiv
- 💻 Env
- Mobile
- 🔑 Keywords
TLDR
AppVLM is a lightweight vision-language model for mobile app control that is trained first on AndroidControl and then refined with trajectories collected in AndroidWorld. It achieves the best offline action prediction among the compared baselines and matches GPT-4o on online AndroidWorld success rate while running much faster.
Related papers (24)
- SE-GA: Memory-Augmented Self-Evolution for GUI AgentsMay 16, 2026 · arXiv
- ClawGUI: A Unified Framework for Training, Evaluating, and Deploying GUI AgentsApril 13, 2026 · arXiv
- Don't Act Blindly: Robust GUI Automation via Action-Effect Verification and Self-CorrectionApril 7, 2026 · ACL 2026
- UI-Voyager: A Self-Evolving GUI Agent Learning via Failed ExperienceMarch 25, 2026 · arXiv
- HATS: Hardness-Aware Trajectory Synthesis for GUI AgentsMarch 12, 2026 · CVPR 2026
- SecAgent: Efficient Mobile GUI Agent with Semantic ContextMarch 9, 2026 · arXiv
- Mobile-Agent-v3.5: Multi-platform Fundamental GUI AgentsFebruary 15, 2026 · arXiv
- Adaptive Milestone Reward for GUI AgentsFebruary 12, 2026 · arXiv
- UI-Oceanus: Scaling GUI Agents with Synthetic Environmental DynamicsFebruary 11, 2026 · arXiv
- OmegaUse: Building a General-Purpose GUI Agent for Autonomous Task ExecutionJanuary 28, 2026 · arXiv
- MAI-UI Technical Report: Real-World Centric Foundation GUI AgentsDecember 26, 2025 · arXiv
- Step-GUI Technical ReportDecember 17, 2025 · arXiv
- AFRAgent : An Adaptive Feature Renormalization Based High Resolution Aware GUI agentNovember 30, 2025 · WACV 2026
- ScaleCUA: Scaling Open-Source Computer Use Agents with Cross-Platform DataSeptember 18, 2025 · ICLR 2026 (Oral)
- MobileRL: Online Agentic Reinforcement Learning for Mobile GUI AgentsSeptember 10, 2025 · ICLR 2026 (Poster)
- Succeed or Learn Slowly: Sample Efficient Off-Policy Reinforcement Learning for Mobile App ControlSeptember 1, 2025 · NeurIPS 2025 (Poster)
- Mobile-Agent-v3: Fundamental Agents for GUI AutomationAugust 21, 2025 · arXiv
- UI-Venus Technical Report: Building High-performance UI Agents with RFTAugust 14, 2025 · arXiv
- AgentCPM‑GUI: Building Mobile‑Use Agents with Reinforcement Fine‑TuningJune 2, 2025 · EMNLP 2025 System Demonstrations
- UI-Genie: A Self-Improving Approach for Iteratively Boosting MLLM-based Mobile GUI AgentsMay 27, 2025 · NeurIPS 2025 (Poster)
- UI-R1: Enhancing Efficient Action Prediction of GUI Agents by Reinforcement LearningMarch 27, 2025 · arXiv
- Advancing Mobile GUI Agents: A Verifier-Driven Approach to Practical DeploymentMarch 20, 2025 · MobiCom 2026
- SpiritSight Agent: Advanced GUI Agent with One LookMarch 5, 2025 · CVPR 2025 (Poster)
- UI-TARS: Pioneering Automated GUI Interaction with Native AgentsJanuary 21, 2025 · arXiv