CoCo-Agent: A Comprehensive Cognitive MLLM Agent for Smartphone GUI Automation

🏛 Institutions: SJTU
📅 Date: February 19, 2024
📑 Publisher: Findings of ACL 2024
💻 Env: Mobile
🔑 Keywords: smartphone GUI automation comprehensive environment perception conditional action prediction AITW META-GUI CoCo-Agent

TLDR

CoCo-Agent is a smartphone GUI agent built around comprehensive environment perception (CEP) and conditional action prediction (CAP). The paper reports state-of-the-art performance on AITW and META-GUI, arguing that richer multimodal environment modeling improves mobile action selection.

Open paper arXiv Report issue

Related papers (24)

ClickAgent: Enhancing UI Location Capabilities of Autonomous Agents

October 9, 2024 · SIGDIAL 2025
DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning

June 14, 2024 · NeurIPS 2024 Main Conference Track
CogAgent: A Visual Language Model for GUI Agents

December 14, 2023 · CVPR 2024 (Highlight)
You Only Look at Screens: Multimodal Chain-of-Action Agents

September 20, 2023 · Findings of ACL 2024
Android in the Wild: A Large-Scale Dataset for Android Device Control

July 19, 2023 · NeurIPS 2023 Datasets and Benchmarks Track
META-GUI: Towards Multi-modal Conversational Agents on Mobile GUI

May 23, 2022 · EMNLP 2022
Benchmarking Living-Screen-Native GUI Agents on Short-Video Platforms

June 3, 2026 · arXiv
Context-Aware Workflow Decomposition for Automated Mobile UI Annotation Using Multimodal Large Language Models

June 1, 2026 · arXiv
UI-KOBE: Knowledge-Oriented Behavior Exploration for Lightweight Graph-Guided GUI Agents

May 28, 2026 · arXiv
AndroidDaily: A Verifiable Benchmark for Mobile GUI Agents on Real-World Closed-Source Applications

May 26, 2026 · arXiv
MobileGym: A Verifiable and Highly Parallel Simulation Platform for Mobile GUI Agent Research

May 25, 2026 · arXiv
SimuWoB: Simulating Real-World Mobile Apps for Fast and Faithful GUI Agent Benchmarking

May 24, 2026 · arXiv
SE-GA: Memory-Augmented Self-Evolution for GUI Agents

May 16, 2026 · arXiv
ClawGUI: A Unified Framework for Training, Evaluating, and Deploying GUI Agents

April 13, 2026 · arXiv
CORA: Conformal Risk-Controlled Agents for Safeguarded Mobile GUI Automation

April 10, 2026 · arXiv
KnowU-Bench: Towards Interactive, Proactive, and Personalized Mobile Agent Evaluation

April 9, 2026 · arXiv
Android Coach: Improve Online Agentic Training Efficiency with Single State Multiple Actions

April 8, 2026 · arXiv
Don't Act Blindly: Robust GUI Automation via Action-Effect Verification and Self-Correction

April 7, 2026 · ACL 2026
Proactive Agent Research Environment: Simulating Active Users to Evaluate Proactive Assistants

April 1, 2026 · arXiv
PSPA-Bench: A Personalized Benchmark for Smartphone GUI Agent

March 31, 2026 · arXiv
UI-Voyager: A Self-Evolving GUI Agent Learning via Failed Experience

March 25, 2026 · arXiv
Towards Automated Crowdsourced Testing via Personified-LLM

March 25, 2026 · FSE 2026
AgentRAE: Remote Action Execution through Notification-based Visual Backdoors against Screenshots-based Mobile GUI Agents

March 24, 2026 · arXiv
AndroTMem: From Interaction Trajectories to Anchored Memory in Long-Horizon GUI Agents

March 19, 2026 · arXiv