GUI Agents Papers
Star · 821

GAIA: a benchmark for General AI Assistants

Grégoire Mialon , Clémentine Fourrier , Craig Swift , Thomas Wolf , Yann LeCun , Thomas Scialom

🏛 Institutions
Meta AI , Hugging Face , AutoGPT
📅 Date
November 21, 2023
📑 Publisher
ICLR 2024
💻 Env
🔑 Keywords
TLDR

GAIA is a benchmark developed for evaluating general-purpose AI assistants. It aims to test assistant models across multiple modalities and complex reasoning tasks in real-world settings, including scenarios that require tool usage and open-ended question answering. With a dataset comprising 466 questions across various domains, GAIA highlights gaps between current AI performance and human capability, presenting a significant challenge for large language models such as GPT-4.

Open paper Report issue
Related papers (24)