GUI Agents Papers
Star · 821

MobileFlow: A Multimodal LLM for Mobile GUI Agent

Songqin Nong , Jiali Zhu , Rui Wu , Jiongchao Jin , Shuo Shan , Xiutian Huang , Wenhao Xu

🏛 Institutions
Ant Group
📅 Date
July 5, 2024
📑 Publisher
arXiv
💻 Env
Mobile
🔑 Keywords
TLDR

MobileFlow adapts Qwen-VL-Chat into a 21B mobile GUI model with hybrid visual encoders, MoE expansion, and GUI-specific alignment and chain-of-thought training. The model is built to handle variable-resolution screens and multilingual interfaces without depending on system APIs for page layout access.

Open paper arXiv Report issue
Related papers (24)