You Only Look at Screens: Multimodal Chain-of-Action Agents

🏛 Institutions: SJTU , Meta
📅 Date: September 20, 2023
📑 Publisher: Findings of ACL 2024
💻 Env: Mobile
🔑 Keywords: framework benchmark chain-of-action Auto-GUI AITW screenshot-only control

TLDR

Auto-GUI is a screenshot-only mobile GUI agent that avoids environment parsing and application-specific APIs. The paper introduces a chain-of-action prompting technique and evaluates the method on AITW, a device-control benchmark with 30K unique instructions.

Open paper Report issue

Related papers (24)

MobileGym: A Verifiable and Highly Parallel Simulation Platform for Mobile GUI Agent Research

May 25, 2026 · arXiv
GUITester: Enabling GUI Agents for Exploratory Defect Discovery

January 8, 2026 · arXiv
GUI-explorer: Autonomous Exploration and Mining of Transition-aware Knowledge for GUI Agent

May 22, 2025 · ACL 2025
LearnAct: Few-Shot Mobile GUI Agent with a Unified Demonstration Benchmark

April 18, 2025 · arXiv
ClickAgent: Enhancing UI Location Capabilities of Autonomous Agents

October 9, 2024 · SIGDIAL 2025
AutoDroid: LLM-powered Task Automation in Android

August 29, 2023 · MobiCom 2024
Android in the Wild: A Large-Scale Dataset for Android Device Control

July 19, 2023 · NeurIPS 2023 Datasets and Benchmarks Track
LongHorizonUI: A Unified Framework for Robust long-horizon Task Automation of GUI Agent

January 26, 2026 · ICLR 2026 (Poster)
WorldGUI: An Interactive Benchmark for Desktop GUI Automation from Any Starting Point

February 12, 2025 · arXiv
WebWalker: Benchmarking LLMs in Web Traversal

January 13, 2025 · arXiv
The BrowserGym Ecosystem for Web Agent Research

December 6, 2024 · TMLR
SheetCopilot: Bringing Software Productivity to the Next Level through Large Language Models

May 30, 2023 · NeurIPS 2023
Grounding Open-Domain Instructions to Automate Web Support Tasks

March 30, 2021 · NAACL 2021
Benchmarking Living-Screen-Native GUI Agents on Short-Video Platforms

June 3, 2026 · arXiv
AndroidDaily: A Verifiable Benchmark for Mobile GUI Agents on Real-World Closed-Source Applications

May 26, 2026 · arXiv
SimuWoB: Simulating Real-World Mobile Apps for Fast and Faithful GUI Agent Benchmarking

May 24, 2026 · arXiv
ClawGUI: A Unified Framework for Training, Evaluating, and Deploying GUI Agents

April 13, 2026 · arXiv
CORA: Conformal Risk-Controlled Agents for Safeguarded Mobile GUI Automation

April 10, 2026 · arXiv
KnowU-Bench: Towards Interactive, Proactive, and Personalized Mobile Agent Evaluation

April 9, 2026 · arXiv
Don't Act Blindly: Robust GUI Automation via Action-Effect Verification and Self-Correction

April 7, 2026 · ACL 2026
Proactive Agent Research Environment: Simulating Active Users to Evaluate Proactive Assistants

April 1, 2026 · arXiv
PSPA-Bench: A Personalized Benchmark for Smartphone GUI Agent

March 31, 2026 · arXiv
AndroTMem: From Interaction Trajectories to Anchored Memory in Long-Horizon GUI Agents

March 19, 2026 · arXiv
GUI-CEval: A Hierarchical and Comprehensive Chinese Benchmark for Mobile GUI Agents

March 16, 2026 · CVPR 2026