SheetCopilot: Bringing Software Productivity to the Next Level through Large Language Models

Hongxin Li , Jingran Su , Yuntao Chen , Qing Li , Zhaoxiang Zhang

🏛 Institutions: University of Chinese Academy of Sciences , Hong Kong Institute of Science and Innovation , CAS , PolyU , Shanghai AI Laboratory
📅 Date: May 30, 2023
📑 Publisher: NeurIPS 2023
💻 Env: General GUI
🔑 Keywords: framework benchmark dataset spreadsheet automation state machine planning SheetCopilot

TLDR

SheetCopilot studies spreadsheet control with an LLM agent that plans over a state-machine abstraction of spreadsheet operations. The paper also releases a 221-task spreadsheet-control dataset and an automated evaluation pipeline for benchmarking software-control performance.

Open paper Report issue

Related papers (24)

LearnAct: Few-Shot Mobile GUI Agent with a Unified Demonstration Benchmark

April 18, 2025 · arXiv
Grounding Open-Domain Instructions to Automate Web Support Tasks

March 30, 2021 · NAACL 2021
LongHorizonUI: A Unified Framework for Robust long-horizon Task Automation of GUI Agent

January 26, 2026 · ICLR 2026 (Poster)
GUIGuard: Toward a General Framework for Privacy-Preserving GUI Agents

January 26, 2026 · arXiv
Beyond Clicking: A Step Towards Generalist GUI Grounding via Text Dragging

November 7, 2025 · arXiv
Scaling Computer‑Use Grounding via User Interface Decomposition and Synthesis

May 19, 2025 · NeurIPS 2025 Datasets and Benchmarks Track (Spotlight)
UI-E2I-Synth: Advancing GUI Grounding with Large-Scale Instruction Synthesis

April 15, 2025 · Findings of ACL 2025
Read Anywhere Pointed: Layout-aware GUI Screen Reading with Tree-of-Lens Grounding

June 27, 2024 · EMNLP 2024 (Poster)
MobileGym: A Verifiable and Highly Parallel Simulation Platform for Mobile GUI Agent Research

May 25, 2026 · arXiv
WebForge: Breaking the Realism-Reproducibility-Scalability Trilemma in Browser Agent Benchmark

April 13, 2026 · arXiv
Gym-Anything: Turn any Software into an Agent Environment

April 7, 2026 · arXiv
WebArena-Infinity: Generating Browser Environments with Verifiable Tasks at Scale

March 2026 · Blog Post
PSPA-Bench: A Personalized Benchmark for Smartphone GUI Agent

March 31, 2026 · arXiv
SecAgent: Efficient Mobile GUI Agent with Semantic Context

March 9, 2026 · arXiv
WebChain: A Large-Scale Human-Annotated Dataset of Real-World Web Interaction Traces

March 5, 2026 · arXiv
Turing Test on Screen: A Benchmark for Mobile GUI Agent Humanization

February 24, 2026 · arXiv
AmbiBench: Benchmarking Mobile GUI Agents Beyond One-Shot Instructions in the Wild

February 12, 2026 · arXiv
When Actions Go Off-Task: Detecting and Correcting Misaligned Actions in Computer-Use Agents

February 9, 2026 · arXiv
MemGUI-Bench: Benchmarking Memory of Mobile GUI Agents in Dynamic Environments

February 3, 2026 · arXiv
SwipeGen: Bridging the Execution Gap in GUI Agents via Human-like Swipe Synthesis

January 26, 2026 · arXiv
SMAN-Bench: A Cross-System Benchmark for Mobile Agents under Single- and Multi-path, Ambiguous, and Noisy Tasks

January 26, 2026 · ICLR 2026 (Poster)
GUITester: Enabling GUI Agents for Exploratory Defect Discovery

January 8, 2026 · arXiv
ShowUI-π: Flow-based Generative Models as GUI Dexterous Hands

December 31, 2025 · arXiv
MobileWorldBench: Towards Semantic World Modeling For Mobile Agents

December 16, 2025 · arXiv