Large Language Models Empowered Personalized Web Agents
Hongru Cai , Yongqi Li , Wenjie Wang , Fengbin Zhu , Xiaoyu Shen , Wenjie Li , Tat-Seng Chua
- 🏛 Institutions
- NUS , PolyU , USTC , Eastern Institute of Technology , Ningbo
- 📅 Date
- October 22, 2024
- 📑 Publisher
- WWW 2025
- 💻 Env
- Web
- 🔑 Keywords
TLDR
This paper formulates personalized web agents that condition on user profiles and historical web behaviors, introduces the PersonalWAB benchmark for evaluating that setting, and proposes the PUMA alignment method. PUMA uses a memory bank with task-specific retrieval plus fine-tuning and preference optimization to improve user-dependent action execution.
Related papers (24)
- Persona2Web: Benchmarking Personalized Web Agents for Contextual Reasoning with User HistoryFebruary 19, 2026 · arXiv
- KnowU-Bench: Towards Interactive, Proactive, and Personalized Mobile Agent EvaluationApril 9, 2026 · arXiv
- PSPA-Bench: A Personalized Benchmark for Smartphone GUI AgentMarch 31, 2026 · arXiv
- PersonalAlign: Hierarchical Implicit Intent Alignment for Personalized GUI Agent with Long-Term User-Centric RecordsJanuary 14, 2026 · arXiv
- Odysseys: Benchmarking Web Agents on Realistic Long Horizon TasksApril 27, 2026 · arXiv
- WebForge: Breaking the Realism-Reproducibility-Scalability Trilemma in Browser Agent BenchmarkApril 13, 2026 · arXiv
- The Amazing Agent Race: Strong Tool Users, Weak NavigatorsApril 11, 2026 · arXiv
- ClawBench: Can AI Agents Complete Everyday Online Tasks?April 9, 2026 · arXiv
- GameWorld: Towards Standardized and Verifiable Evaluation of Multimodal Game AgentsApril 8, 2026 · arXiv
- WebSP-Eval: Evaluating Web Agents on Website Security and Privacy TasksApril 7, 2026 · arXiv
- The Art of Building Verifiers for Computer Use AgentsApril 5, 2026 · arXiv
- When Users Change Their Mind: Evaluating Interruptible Agents in Long-Horizon Web NavigationApril 1, 2026 · arXiv
- WebArena-Infinity: Generating Browser Environments with Verifiable Tasks at ScaleMarch 2026 · Blog Post
- Vision2Web: A Hierarchical Benchmark for Visual Website Development with Agent VerificationMarch 27, 2026 · arXiv
- WebTestBench: Evaluating Computer-Use Agents towards End-to-End Automated Web TestingMarch 26, 2026 · arXiv
- Ego2Web: A Web Agent Benchmark Grounded in Egocentric VideosMarch 23, 2026 · CVPR 2026
- WebPII: Benchmarking Visual PII Detection for Computer-Use AgentsMarch 18, 2026 · arXiv
- WebChain: A Large-Scale Human-Annotated Dataset of Real-World Web Interaction TracesMarch 5, 2026 · arXiv
- TimeWarp: Evaluating Web Agents by Revisiting the PastMarch 5, 2026 · arXiv
- PATHWAYS: Evaluating Investigation and Context Discovery in AI Web AgentsFebruary 5, 2026 · arXiv
- WebTrap Park: An Automated Platform for Systematic Security Evaluation of Web AgentsJanuary 13, 2026 · arXiv
- WebGym: Scaling Training Environments for Visual Web Agents with Realistic TasksJanuary 5, 2026 · arXiv
- It's a TRAP! Task-Redirecting Agent Persuasion Benchmark for Web AgentsDecember 29, 2025 · arXiv
- DECEPTICON: How Dark Patterns Manipulate Web AgentsDecember 28, 2025 · arXiv