OS-ATLAS: A Foundation Action Model for Generalist GUI Agents
Zhiyong Wu, Zhenyu Wu, Fangzhi Xu, Yian Wang, Qiushi Sun, Chengyou Jia, Kanzhi Cheng, Zichen Ding, Liheng Chen, Paul Pu Liang, Yu Qiao
- 🏛 Institutions
- Shanghai AI Laboratory, SJTU, HKU, MIT
- 📅 Date
- October 30, 2024
- 📑 Publisher
- ICLR 2025 (Spotlight)
- 💻 Env
- Desktop Mobile Web
- 🔑 Keywords
TLDR
OS-Atlas is a foundation action model for GUI agents built on a multi-platform grounding-data synthesis toolkit and a corpus with more than 13 million GUI elements. It improves GUI grounding and zero-shot out-of-distribution agent performance across desktop, mobile, and web benchmarks.
Related papers
- SpiritSight Agent: Advanced GUI Agent with One LookMarch 5, 2025 · CVPR 2025 (Poster)
- UI-TARS: Pioneering Automated GUI Interaction with Native AgentsJanuary 21, 2025 · arXiv
- Ponder & Press: Advancing Visual GUI Agent towards General Computer ControlDecember 2, 2024 · Findings of ACL 2025
- Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI AgentsOctober 7, 2024 · ICLR 2025 (Oral)
- SeeClick: Harnessing GUI Grounding for Advanced Visual GUI AgentsJanuary 17, 2024 · ACL 2024
- OpenCUA: Open Foundations for Computer-Use AgentsAugust 12, 2025 · NeurIPS 2025 (Spotlight)