Aria-UI: Visual Grounding for GUI Instructions
Yuhao Yang, Yue Wang, Dongxu Li, Ziyang Luo, Bei Chen, Chao Huang, Junnan Li
- 🏛 Institutions
- HKU, Salesforce AI Research, Alibaba Group, Australian National University, Independent Researcher
- 📅 Date
- December 20, 2024
- 📑 Publisher
- Findings of ACL 2025
- 💻 Env
- General GUI
- 🔑 Keywords
TLDR
Aria-UI is a GUI-grounding model that deliberately avoids HTML or AXTree inputs and instead works from pure visual observations. It pairs a scalable instruction-synthesis pipeline with interleaved textual and text-image action histories for context-aware grounding, and reports state-of-the-art results across offline and online grounding benchmarks.
Related papers
- Ponder & Press: Advancing Visual GUI Agent towards General Computer ControlDecember 2, 2024 · Findings of ACL 2025
- UI-AGILE: Advancing GUI Agents with Effective Reinforcement Learning and Precise Inference-Time GroundingJuly 29, 2025 · CVPR 2026 Findings
- UI-E2I-Synth: Advancing GUI Grounding with Large-Scale Instruction SynthesisApril 15, 2025 · Findings of ACL 2025
- Think Twice, Click Once: Enhancing GUI Grounding via Fast and Slow SystemsMarch 9, 2025 · arXiv
- Aguvis: Unified Pure Vision Agents for Autonomous GUI InteractionDecember 5, 2024 · ICML 2025 (Poster)
- AFRAgent : An Adaptive Feature Renormalization Based High Resolution Aware GUI agentNovember 30, 2025 · WACV 2026