ClickAgent: Enhancing UI Location Capabilities of Autonomous Agents
Jakub Hoscilowicz , Bartosz Maj , Bartosz Kozakiewicz , Oleksii Tymoshchuk , Artur Janicki
- 🏛 Institutions
- Samsung R&D Poland , Warsaw University of Technology
- 📅 Date
- October 9, 2024
- 📑 Publisher
- SIGDIAL 2025
- 💻 Env
- Mobile
- 🔑 Keywords
TLDR
Proposes ClickAgent, a mobile agent framework that separates high-level reasoning from precise UI element localization. By pairing an MLLM planner with a dedicated grounding component, it improves task success on AITW and on real-device Android evaluations.
Related papers (24)
- You Only Look at Screens: Multimodal Chain-of-Action AgentsSeptember 20, 2023 · Findings of ACL 2024
- OpeFlo: Automated UX Evaluation via Simulated Human Web Interaction with GUI GroundingFebruary 25, 2026 · arXiv
- MobileGym: A Verifiable and Highly Parallel Simulation Platform for Mobile GUI Agent ResearchMay 25, 2026 · arXiv
- ClawGUI: A Unified Framework for Training, Evaluating, and Deploying GUI AgentsApril 13, 2026 · arXiv
- GraphPilot: GUI Task Automation with One-Step LLM Reasoning Powered by Knowledge GraphJanuary 24, 2026 · Journal of Intelligent Computing and Networking
- GUITester: Enabling GUI Agents for Exploratory Defect DiscoveryJanuary 8, 2026 · arXiv
- VenusBench-GD: A Comprehensive Multi-Platform GUI Benchmark for Diverse Grounding TasksDecember 18, 2025 · arXiv
- AFRAgent : An Adaptive Feature Renormalization Based High Resolution Aware GUI agentNovember 30, 2025 · WACV 2026
- Surfer 2: The Next Generation of Cross-Platform Computer Use AgentsOctober 22, 2025 · arXiv
- CORE: Reducing UI Exposure in Mobile Agents via Collaboration Between Cloud and Local LLMsOctober 17, 2025 · NeurIPS 2025 (Poster)
- Agent-SAMA: State-Aware Mobile AssistantMay 29, 2025 · AAAI 2026
- BacktrackAgent: Enhancing GUI Agent with Error Detection and Backtracking MechanismMay 27, 2025 · EMNLP 2025 (Oral)
- GUI-explorer: Autonomous Exploration and Mining of Transition-aware Knowledge for GUI AgentMay 22, 2025 · ACL 2025
- Building a Stable Planner: An Extended Finite State Machine Based Planning Module for Mobile GUI AgentMay 20, 2025 · arXiv
- ScaleTrack: Scaling and back-tracking Automated GUI AgentsMay 1, 2025 · arXiv
- ReachAgent: Enhancing Mobile Agent via Page Reaching and OperationApril 30, 2025 · NAACL 2025 (Poster)
- MobA: Multifaceted Memory-Enhanced Adaptive Planning for Efficient Mobile Task AutomationApril 30, 2025 · NAACL 2025 (System Demonstrations)
- LearnAct: Few-Shot Mobile GUI Agent with a Unified Demonstration BenchmarkApril 18, 2025 · arXiv
- UI-TARS: Pioneering Automated GUI Interaction with Native AgentsJanuary 21, 2025 · arXiv
- Ponder & Press: Advancing Visual GUI Agent towards General Computer ControlDecember 2, 2024 · Findings of ACL 2025
- Improved GUI Grounding via Iterative NarrowingNovember 18, 2024 · arXiv
- OS-ATLAS: A Foundation Action Model for Generalist GUI AgentsOctober 30, 2024 · ICLR 2025 (Spotlight)
- Lightweight Neural App ControlOctober 23, 2024 · ICLR 2025 (Spotlight)
- DistRL: An Asynchronous Distributed Reinforcement Learning Framework for On-Device Control AgentsOctober 18, 2024 · ICLR 2025 (Poster)