Agent Q: Advanced Reasoning and Learning for Autonomous AI Agents

Pranav Putta , Edmund Mills , Naman Garg , Sumeet Motwani , Chelsea Finn , Divyansh Garg , Rafael Rafailov

🏛 Institutions: The AGI Company (MultiOn) , Stanford
📅 Date: August 13, 2024
📑 Publisher: arXiv
💻 Env: Web
🔑 Keywords: reinforcement learning MCTS self-critique off-policy DPO WebShop online search Agent Q

TLDR

Agent Q combines guided MCTS, self-critique, and off-policy DPO to learn from both successful and failed web-agent trajectories. It improves performance on WebShop and raises long-horizon booking success from 18.6% to 81.7% after one day of data collection, further reaching 95.4% when online search is enabled.

Open paper arXiv Report issue