TinyClick: Single-Turn Agent for Empowering GUI Automation

Pawel Pawlowski , Krystian Zawistowski , Wojciech Lapacz , Adam Wiacek , Marcin Skorupa , Sebastien Postansque , Jakub Hoscilowicz

🏛 Institutions: Samsung R&D Poland , Warsaw University of Technology
📅 Date: October 9, 2024
📑 Publisher: INTERSPEECH 2025
💻 Env: Desktop Mobile Web
🔑 Keywords: GUI grounding single-turn agent on-device model Florence-2 ScreenSpot OmniACT TinyClick

TLDR

TinyClick is a 0.27B single-turn GUI agent built on Florence-2-Base that predicts the target UI element from a screenshot and user command. The paper attributes its gains to vision-specific multitask training and MLLM-based data augmentation, and reports strong results on ScreenSpot and OmniAct annotations while keeping latency and training cost low.

Open paper Report issue