ToolTok: Tool Tokenization for Efficient and Generalizable GUI Agents
Xiaoce Wang, Guibin Zhang, Junzhe Li, Jinzhe Tu, Chun Li, Ming Li
- 🏛 Institutions
- Tsinghua, NUS, PKU, Shenzhen MSU-BIT University, Guangming Laboratory
- 📅 Date
- January 30, 2026
- 📑 Publisher
- arXiv
- 💻 Env
- General GUI
- 🔑 Keywords
TLDR
ToolTok treats GUI operations as paths over learnable tool tokens with semantic anchoring and curriculum learning. This makes GUI agents more efficient and generalizable, reaching competitive performance with far less training data than other post-training methods.
Related papers
- CocoaBench: Evaluating Unified Digital Agents in the WildApril 13, 2026 · arXiv
- Are GUI Agents Focused Enough? Automated Distraction via Semantic-level UI Element InjectionApril 9, 2026 · arXiv
- Autonomous Continual Learning of Computer-Use Agents for Environment AdaptationFebruary 10, 2026 · arXiv
- iSHIFT: Lightweight Slow-Fast GUI Agent with Adaptive PerceptionDecember 26, 2025 · arXiv
- Visual Grounding for User InterfacesJune 16, 2024 · NAACL 2024 Industry Track
- The Amazing Agent Race: Strong Tool Users, Weak NavigatorsApril 11, 2026 · arXiv