Falcon-UI: Understanding GUI Before Following User Instructions
Huawen Shen, Chang Liu, Gengluo Li, Xinlong Wang, Yu Zhou, Can Ma, Xiangyang Ji
- 🏛 Institutions
- Institute of Information Engineering, CAS, Nankai University, Tsinghua, Beijing Academy of Artificial Intelligence, University of Chinese Academy of Sciences
- 📅 Date
- December 12, 2024
- 📑 Publisher
- arXiv
- 💻 Env
- General GUI
- 🔑 Keywords
TLDR
Falcon-UI studies whether GUI-context understanding should be learned before instruction following. It introduces the large instruction-free Insight-UI Dataset for GUI pretraining and shows that this staged training improves a 7B model enough to approach much larger baselines.
Related papers
- Aguvis: Unified Pure Vision Agents for Autonomous GUI InteractionDecember 5, 2024 · ICML 2025 (Poster)
- ScreenAI: A Vision-Language Model for UI and Infographics UnderstandingFebruary 7, 2024 · IJCAI 2024
- MolmoWeb: Open Visual Web Agent and Open Data for the Open WebApril 9, 2026 · arXiv
- SecAgent: Efficient Mobile GUI Agent with Semantic ContextMarch 9, 2026 · arXiv
- ShowUI-π: Flow-based Generative Models as GUI Dexterous HandsDecember 31, 2025 · arXiv
- OpenCUA: Open Foundations for Computer-Use AgentsAugust 12, 2025 · NeurIPS 2025 (Spotlight)