GUI Agents Papers
Star · 751

GAIA: a benchmark for General AI Assistants

Grégoire Mialon, Clémentine Fourrier, Craig Swift, Thomas Wolf, Yann LeCun, Thomas Scialom

🏛 Institutions
Meta AI, Hugging Face, AutoGPT
📅 Date
November 21, 2023
📑 Publisher
ICLR 2024
💻 Env
🔑 Keywords
TLDR

GAIA is a benchmark developed for evaluating general-purpose AI assistants. It aims to test assistant models across multiple modalities and complex reasoning tasks in real-world settings, including scenarios that require tool usage and open-ended question answering. With a dataset comprising 466 questions across various domains, GAIA highlights gaps between current AI performance and human capability, presenting a significant challenge for large language models such as GPT-4.

Open paper Edit on GitHub Report issue
Related papers