Cross-Modal Content Optimization for Steering Web Agent Preferences

Tanqiu Jiang , Min Bai , Nikolaos Pappas , Yanjun Qi , Sandesh Swamy

🏛 Institutions: Stony Brook University , AWS AI Labs
📅 Date: October 4, 2025
📑 Publisher: arXiv
💻 Env: Web
🔑 Keywords: preference steering black-box attack multimodal attack stealth CPS

TLDR

This paper introduces Cross-Modal Preference Steering (CPS), a black-box attack that jointly perturbs an item's image and text to bias web-agent ranking and selection decisions. Under a realistic threat model where the attacker controls only their own listing metadata, CPS outperforms prior baselines across GPT-4.1, Qwen-2.5VL, and Pixtral-Large while keeping detection rates much lower.

Open paper arXiv Report issue