iSHIFT: Lightweight Slow-Fast GUI Agent with Adaptive Perception
Sarthak Mehrotra, Sairam V C Rebbapragada, Mani Hemanth Reddy Bonthu, Vineeth N Balasubramanian
- 🏛 Institutions
- Indian Institute of Technology, Bombay, Indian Institute of Technology, Hyderabad
- 📅 Date
- December 26, 2025
- 📑 Publisher
- arXiv
- 💻 Env
- General GUI
- 🔑 Keywords
TLDR
iSHIFT is a 2.5B GUI agent that combines latent thinking with perception-control tokens so it can switch between a fast global mode and a slower grounding-heavy mode. The paper positions this as a way to allocate reasoning depth and visual focus adaptively while still matching state-of-the-art results on multiple GUI benchmarks.
Related papers
- CocoaBench: Evaluating Unified Digital Agents in the WildApril 13, 2026 · arXiv
- Are GUI Agents Focused Enough? Automated Distraction via Semantic-level UI Element InjectionApril 9, 2026 · arXiv
- ToolTok: Tool Tokenization for Efficient and Generalizable GUI AgentsJanuary 30, 2026 · arXiv
- Visual Grounding for User InterfacesJune 16, 2024 · NAACL 2024 Industry Track
- How do Visual Attributes Influence Web Agents? A Comprehensive Evaluation of User Interface Design FactorsJanuary 29, 2026 · arXiv
- Iris: Breaking GUI Complexity with Adaptive Focus and Self-RefiningDecember 13, 2024 · arXiv