GUI Agents Papers
Star · 751

Towards GUI Agents: Vision-Language Diffusion Models for GUI Grounding

Shrinidhi Kumbhar, Haofu Liao, Srikar Appalaraju, Kunwar Yashraj Singh

🏛 Institutions
Arizona State University, Amazon (AWS Agentic AI)
📅 Date
March 27, 2026
📑 Publisher
CVPR 2026
💻 Env
General GUI
🔑 Keywords
TLDR

This paper adapts the discrete diffusion model LLaDA-V to GUI grounding and proposes a hybrid masking schedule for bounding-box prediction. Across web, desktop, and mobile benchmarks, the diffusion model outperforms its linear-masked variant and remains competitive with autoregressive VLMs.

Open paper arXiv Edit on GitHub Report issue
Related papers