SteP: Stacked LLM Policies for Web Actions
Paloma Sodhi, S.R.K Branavan, Yoav Artzi, Ryan McDonald
- 🏛 Institutions
- ASAPP Research, Cornell
- 📅 Date
- October 5, 2023
- 📑 Publisher
- COLM 2024
- 💻 Env
- Web
- 🔑 Keywords
TLDR
SteP is a web-agent framework that composes LLM policies through an explicit control stack rather than a single monolithic prompt. It evaluates on WebArena, MiniWoB++, and a CRM environment, and substantially improves WebArena performance over prior GPT-4-based baselines.
Related papers
- ColorBrowserAgent: Complex Long-Horizon Browser Agent with Adaptive Knowledge EvolutionJanuary 12, 2026 · arXiv
- Inducing Programmatic Skills for Agentic TasksApril 9, 2025 · COLM 2025
- The Tool Illusion: Rethinking Tool Use in Web AgentsApril 3, 2026 · arXiv
- When Users Change Their Mind: Evaluating Interruptible Agents in Long-Horizon Web NavigationApril 1, 2026 · arXiv
- WebArena-Infinity: Generating Browser Environments with Verifiable Tasks at ScaleMarch 2026 · Blog Post
- AI Planning Framework for LLM-Based Web AgentsMarch 13, 2026 · arXiv