M^2: Dual-Memory Augmentation for Long-Horizon Web Agents via Trajectory Summarization and Insight Retrieval

Dawei Yan , Haokui Zhang , Guangda Huzhang , Yang Li , Yibo Wang , Qing-Guo Chen , Zhao Xu , Weihua Luo , Ying Li , Wei Dong , Chunhua Shen

🏛 Institutions: Northwestern Polytechnical University , Alibaba Group , Xi'an University of Architecture and Technology , ZJU
📅 Date: February 28, 2026
📑 Publisher: arXiv
💻 Env: Web
🔑 Keywords: memory augmentation trajectory summarization insight retrieval training-free long-horizon tasks M^2

TLDR

M^2 is a training-free memory augmentation method for long-horizon web agents that combines dynamic trajectory summarization with offline insight retrieval. It improves success rates on WebVoyager and OnlineMind2Web while substantially reducing token usage.

Open paper arXiv Report issue

Related papers (24)

GPA: Learning GUI Process Automation from Demonstrations

April 2, 2026 · arXiv
ClawBench: Can AI Agents Complete Everyday Online Tasks?

April 9, 2026 · arXiv
WebATLAS: An LLM Agent with Experience-Driven Memory and Action Simulation

October 26, 2025 · NeurIPS 2025 Workshop on Language Agents and World Models
Improved GUI Grounding via Iterative Narrowing

November 18, 2024 · arXiv
Auto-Intent: Automated Intent Discovery and Self-Exploration for Large Language Model Web Agents

October 29, 2024 · Findings of EMNLP 2024
AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?

October 21, 2024 · EMNLP 2024 (Poster)
Workflow-GYM: Towards Long-Horizon Evaluation of Computer-use Agentic tasks in Real-World Professional Fields

June 9, 2026 · arXiv
STaR-KV: Spatio-Temporal Adaptive Re-weighting for KV Cache Compression in GUI Vision-Language Models

June 1, 2026 · arXiv
SimuWoB: Simulating Real-World Mobile Apps for Fast and Faithful GUI Agent Benchmarking

May 24, 2026 · arXiv
MementoGUI: Learning Agentic Multimodal Memory Control for Long-Horizon GUI Agents

May 18, 2026 · arXiv
UI-Zoomer: Uncertainty-Driven Adaptive Zoom-In for GUI Grounding

April 15, 2026 · arXiv
CocoaBench: Evaluating Unified Digital Agents in the Wild

April 13, 2026 · arXiv
HealthAdminBench: Evaluating Computer-Use Agents on Healthcare Administration Tasks

April 10, 2026 · arXiv
Gym-Anything: Turn any Software into an Agent Environment

April 7, 2026 · arXiv
GUIDE: Resolving Domain Bias in GUI Agents through Real-Time Web Video Retrieval and Plug-and-Play Annotation

March 27, 2026 · arXiv
AndroTMem: From Interaction Trajectories to Anchored Memory in Long-Horizon GUI Agents

March 19, 2026 · arXiv
Zoom to Essence: Trainless GUI Grounding by Inferring upon Interface Elements

March 15, 2026 · arXiv
Trifuse: Enhancing Attention-Based GUI Grounding via Multimodal Fusion

February 6, 2026 · arXiv
Darwinian Memory: A Training-Free Self-Regulating Memory System for GUI Agent Evolution

January 30, 2026 · arXiv
OS-Marathon: Benchmarking Computer-Use Agents on Long-Horizon Repetitive Tasks

January 28, 2026 · arXiv
MobileBench-OL: A Comprehensive Chinese Benchmark for Evaluating Mobile GUI Agents in Real-World Environment

January 28, 2026 · arXiv
LongHorizonUI: A Unified Framework for Robust long-horizon Task Automation of GUI Agent

January 26, 2026 · ICLR 2026 (Poster)
MobileWorld: Benchmarking Autonomous Mobile Agents in Agent-User Interactive and MCP-Augmented Environments

December 22, 2025 · arXiv
MVP: Multiple View Prediction Improves GUI Grounding

December 9, 2025 · arXiv