DrawingBench: Evaluating Spatial Reasoning and UI Interaction Capabilities of Large Language Models through Mouse-Based Drawing Tasks

🏛 Institutions: Independent
📅 Date: December 1, 2025
📑 Publisher: AAAI 2026 TrustAgent Workshop
💻 Env: General GUI
🔑 Keywords: benchmark spatial reasoning mouse-based drawing verifiable evaluation multi-turn feedback DrawingBench

TLDR

DrawingBench evaluates agentic models through mouse-based drawing tasks that require issuing low-level GUI actions on a canvas UI rather than answering static spatial questions. It provides 250 prompts, deterministic rule-based scoring, and multi-turn external feedback, showing both strong baseline performance and clear failure modes in tool-state management and long-horizon control.

Open paper arXiv Report issue