GAIA: a benchmark for General AI Assistants

Grégoire Mialon, Clémentine Fourrier, Craig Swift, Thomas Wolf, Yann LeCun, Thomas Scialom

🏛 Institutions: Meta AI, Hugging Face, AutoGPT
📅 Date: November 21, 2023
📑 Publisher: ICLR 2024
💻 Env
🔑 Keywords: benchmark multi-modality tool use reasoning

TLDR

GAIA is a benchmark developed for evaluating general-purpose AI assistants. It aims to test assistant models across multiple modalities and complex reasoning tasks in real-world settings, including scenarios that require tool usage and open-ended question answering. With a dataset comprising 466 questions across various domains, GAIA highlights gaps between current AI performance and human capability, presenting a significant challenge for large language models such as GPT-4.

Open paper Edit on GitHub Report issue