Agent S2: A Compositional Generalist-Specialist Framework for Computer Use Agents
Saaket Agashe, Kyle Wong, Vincent Tu, Jiachen Yang, Ang Li, Xin Eric Wang
- 🏛 Institutions
- Simular Research
- 📅 Date
- April 1, 2025
- 📑 Publisher
- COLM 2025
- 💻 Env
- General GUI
- 🔑 Keywords
TLDR
Agent S2 is a compositional generalist-specialist framework that splits computer-use responsibilities across specialized and generalist models rather than using a single monolithic agent. Its core methods are Mixture-of-Grounding for precise localization and Proactive Hierarchical Planning for long-horizon control, yielding strong gains on OSWorld, WindowsAgentArena, and AndroidWorld.
Related papers
- Watch and Learn: Learning to Use Computers from Online VideosOctober 6, 2025 · CVPR 2026
- CoAct-1: Computer-using Multi-Agent System with Coding ActionsAugust 5, 2025 · ICLR 2026 (Poster)
- LiteCUA: Computer as MCP Server for Computer-Use Agent on AIOSMay 24, 2025 · arXiv
- Agent Alpha: Tree Search Unifying Generation, Exploration and Evaluation for Computer-Use AgentsFebruary 3, 2026 · arXiv
- Agentic Reward Modeling: Verifying GUI Agent via Online Proactive InteractionJanuary 31, 2026 · arXiv
- BEAP-Agent: Backtrackable Execution and Adaptive Planning for GUI AgentsJanuary 29, 2026 · arXiv