GUI Agents Papers
Star · 751

Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V

Jianwei Yang, Hao Zhang, Feng Li, Xueyan Zou, Chunyuan Li, Jianfeng Gao

🏛 Institutions
Microsoft Research
📅 Date
October 17, 2023
📑 Publisher
arXiv
💻 Env
🔑 Keywords
TLDR

Introduces Set-of-Mark prompting, where segmented image regions are overlaid with explicit marks before being passed to a multimodal model. The paper shows that simple region marking can unlock much stronger zero-shot grounding from GPT-4V without fine-tuning.

Open paper arXiv Edit on GitHub Report issue
Related papers