ShowUI-π: Flow-based Generative Models as GUI Dexterous Hands
Siyuan Hu, Kevin Qinghong Lin, Mike Zheng Shou
- 🏛 Institutions
- Show Lab, NUS
- 📅 Date
- December 31, 2025
- 📑 Publisher
- arXiv
- 💻 Env
- Desktop
- 🔑 Keywords
TLDR
ShowUI-π treats GUI dragging as a continuous dexterous-control problem rather than only discrete point prediction, while still supporting ordinary click actions in the same model. It also introduces ScreenDrag with 20K trajectories across five domains, and the 450M-parameter model outperforms much larger proprietary GUI agents on this benchmark.
Related papers
- Efficient Agent Training for Computer UseMay 20, 2025 · ICLR 2026 (Poster)
- SecAgent: Efficient Mobile GUI Agent with Semantic ContextMarch 9, 2026 · arXiv
- Beyond Clicking: A Step Towards Generalist GUI Grounding via Text DraggingNovember 7, 2025 · arXiv
- Web-Shepherd: Advancing PRMs for Reinforcing Web AgentsMay 21, 2025 · NeurIPS 2025 (Spotlight)
- Gym-Anything: Turn any Software into an Agent EnvironmentApril 7, 2026 · arXiv
- When Actions Go Off-Task: Detecting and Correcting Misaligned Actions in Computer-Use AgentsFebruary 9, 2026 · arXiv