UFO2: The Desktop AgentOS
Chaoyun Zhang , He Huang , Chiming Ni , Jian Mu , Si Qin , Shilin He , Lu Wang , Fangkai Yang , Pu Zhao , Chao Du , Liqun Li , Yu Kang , Zhao Jiang , Suzhen Zheng , Rujia Wang , Jiaxu Qian , Minghua Ma , Jian-Guang Lou , Qingwei Lin , Saravan Rajmohan , Dongmei Zhang
- 🏛 Institutions
- Microsoft , ZJU-UIUC Institute , NJU , PKU
- 📅 Date
- April 20, 2025
- 📑 Publisher
- arXiv
- 💻 Env
- Desktop
- 🔑 Keywords
TLDR
UFO2 presents a Windows AgentOS that pairs a coordinating HostAgent with specialized AppAgents for individual applications. Its main system ideas are a unified GUI-API action layer, hybrid UIA-plus-vision perception, speculative multi-action execution, and a picture-in-picture virtual desktop that lets users and the agent operate concurrently.
Related papers (24)
- VLAA-GUI: Knowing When to Stop, Recover, and Search, A Modular Framework for GUI AutomationApril 23, 2026 · arXiv
- EE-MCP: Self-Evolving MCP-GUI Agents via Automated Environment Generation and Experience LearningApril 10, 2026 · arXiv
- ShowUI-Aloha: Human-Taught GUI AgentJanuary 12, 2026 · arXiv
- Surfer 2: The Next Generation of Cross-Platform Computer Use AgentsOctober 22, 2025 · arXiv
- BIMgent: Towards Autonomous Building Modeling via Computer-use AgentsJune 8, 2025 · ICML 2025 Workshop on Computer-use Agents
- LiteCUA: Computer as MCP Server for Computer-Use Agent on AIOSMay 24, 2025 · arXiv
- WorldGUI: An Interactive Benchmark for Desktop GUI Automation from Any Starting PointFebruary 12, 2025 · arXiv
- PC Agent: While You Sleep, AI Works -- A Cognitive Journey into Digital WorldDecember 23, 2024 · arXiv
- AgentStore: Scalable Integration of Heterogeneous Agents As Specialized Generalist Computer AssistantOctober 24, 2024 · Findings of ACL 2025
- Agent S: An Open Agentic Framework that Uses Computers Like a HumanOctober 10, 2024 · ICLR 2025 (Poster)
- AXIS: Efficient Human-Agent-Computer Interaction with API-First LLM-Based AgentsSeptember 26, 2024 · ACL 2025
- OS-Copilot: Towards Generalist Computer Agents with Self-ImprovementFebruary 12, 2024 · LLMAgents @ ICLR 2024
- Synapse: Trajectory-as-Exemplar Prompting with Memory for Computer ControlJune 13, 2023 · ICLR 2024 (Poster)
- MobileGym: A Verifiable and Highly Parallel Simulation Platform for Mobile GUI Agent ResearchMay 25, 2026 · arXiv
- ClawGUI: A Unified Framework for Training, Evaluating, and Deploying GUI AgentsApril 13, 2026 · arXiv
- OpeFlo: Automated UX Evaluation via Simulated Human Web Interaction with GUI GroundingFebruary 25, 2026 · arXiv
- TreeCUA: Efficiently Scaling GUI Automation with Tree-Structured Verifiable EvolutionFebruary 10, 2026 · arXiv
- LongHorizonUI: A Unified Framework for Robust long-horizon Task Automation of GUI AgentJanuary 26, 2026 · ICLR 2026 (Poster)
- GraphPilot: GUI Task Automation with One-Step LLM Reasoning Powered by Knowledge GraphJanuary 24, 2026 · Journal of Intelligent Computing and Networking
- ColorBrowserAgent: Complex Long-Horizon Browser Agent with Adaptive Knowledge EvolutionJanuary 12, 2026 · arXiv
- GUITester: Enabling GUI Agents for Exploratory Defect DiscoveryJanuary 8, 2026 · arXiv
- WebATLAS: An LLM Agent with Experience-Driven Memory and Action SimulationOctober 26, 2025 · NeurIPS 2025 Workshop on Language Agents and World Models
- PolySkill: Learning Generalizable Skills Through Polymorphic AbstractionOctober 17, 2025 · arXiv
- CORE: Reducing UI Exposure in Mobile Agents via Collaboration Between Cloud and Local LLMsOctober 17, 2025 · NeurIPS 2025 (Poster)