GUI Agents Papers
Star · 821

Zooming without Zooming: Region-to-Image Distillation for Fine-Grained Multimodal Perception

Lai Wei , Liangbo He , Jun Lan , Lingzhong Dong , Yutong Cai , Siyuan Li , Huijia Zhu , Weiqiang Wang , Linghe Kong , Yue Wang , Zhuosheng Zhang , Weiran Huang

🏛 Institutions
SJTU , Ant Group , Zhongguancun Academy , Shanghai Innovation Institute
📅 Date
February 12, 2026
📑 Publisher
arXiv
💻 Env
General GUI
🔑 Keywords
TLDR

This paper proposes Region-to-Image Distillation, which teaches a model to internalize zoom-in behavior without requiring explicit crop-and-reason inference at test time. It also introduces ZoomBench and shows stronger fine-grained perception on both perception and GUI-agent benchmarks.

Open paper arXiv Report issue
Related papers (24)