iSHIFT: Lightweight Slow-Fast GUI Agent with Adaptive Perception

Sarthak Mehrotra , Sairam V C Rebbapragada , Mani Hemanth Reddy Bonthu , Vineeth N Balasubramanian

🏛 Institutions: Indian Institute of Technology , Bombay , Indian Institute of Technology , Hyderabad
📅 Date: December 26, 2025
📑 Publisher: arXiv
💻 Env: General GUI
🔑 Keywords: slow-fast reasoning visual grounding latent thinking adaptive perception iSHIFT

TLDR

iSHIFT is a 2.5B GUI agent that combines latent thinking with perception-control tokens so it can switch between a fast global mode and a slower grounding-heavy mode. The paper positions this as a way to allocate reasoning depth and visual focus adaptively while still matching state-of-the-art results on multiple GUI benchmarks.

Open paper arXiv Report issue