GUI Agents Papers
Star · 821

GPT-4V in Wonderland: Large Multimodal Models for Zero-Shot Smartphone GUI Navigation

An Yan , Zhengyuan Yang , Wanrong Zhu , Kevin Lin , Linjie Li , Jianfeng Wang , Jianwei Yang , Yiwu Zhong , Julian McAuley , Jianfeng Gao , Zicheng Liu , Lijuan Wang

🏛 Institutions
UC San Diego , Microsoft , UC Santa Barbara , University of Wisconsin-Madison
📅 Date
November 13, 2023
📑 Publisher
arXiv
💻 Env
Mobile
🔑 Keywords
TLDR

This paper studies zero-shot smartphone GUI navigation with MM-Navigator, a GPT-4V-based mobile agent. It introduces an iOS screen dataset and benchmark, then evaluates transfer to Android by testing the model on a subset of an existing Android navigation dataset.

Open paper arXiv Report issue
Related papers (24)