Adversarial Attacks on Online Learning to Rank with Click Feedback

Online learning to rank (OLTR) is a sequential decision-making problem where a learning agent selects an ordered list of items and receives feedback through user clicks. Although potential attacks against OLTR algorithms may cause serious losses in real-world applications, little is known about adversarial attacks on OLTR. This paper studies attack strategies against multiple variants of OLTR. Our first result provides an attack strategy against the UCB algorithm on classical stochastic bandits with binary feedback, which solves the key issues caused by bounded and discrete feedback that previous works can not handle. Building on this result, we design attack algorithms against UCB-based OLTR algorithms in position-based and cascade models. Finally, we propose a general attack strategy against any algorithm under the general click model. Each attack algorithm manipulates the learning agent into choosing the target attack item $T-o(T)$ times, incurring a cumulative cost of $o(T)$. Experiments on synthetic and real data further validate the effectiveness of our proposed attack algorithms.

[1]  Yuzhe Ma,et al.  Adversarial Attacks on Adversarial Bandits , 2023, ICLR.

[2]  S. Shakkottai,et al.  Minimax Regret for Cascading Bandits , 2022, NeurIPS.

[3]  Hongning Wang,et al.  When Are Linear Stochastic Bandits Attackable? , 2021, ICML.

[4]  Patrick Ernst,et al.  Learning to Rank in the Position Based Model with Bandit Feedback , 2020, CIKM.

[5]  Olivier Teytaud,et al.  Adversarial Attacks on Linear Contextual Bandits , 2020, NeurIPS.

[6]  Stan Sclaroff,et al.  Deep Metric Learning to Rank , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Ness Shroff,et al.  Data Poisoning Attacks on Stochastic Bandits , 2019, ICML.

[8]  Lihong Li,et al.  Adversarial Attacks on Stochastic Bandits , 2018, NeurIPS.

[9]  Shuai Li,et al.  Online Learning to Rank with Features , 2018, ICML.

[10]  Shuai Li,et al.  Contextual Dependent Click Bandit Algorithm for Web Recommendation , 2018, COCOON.

[11]  Shuai Li,et al.  TopRank: A practical algorithm for online stochastic ranking , 2018, NeurIPS.

[12]  Csaba Szepesvári,et al.  Online Learning to Rank in Stochastic Click Models , 2017, ICML.

[13]  Ambuj Tewari,et al.  Online Learning to Rank with Top-k Feedback , 2016, J. Mach. Learn. Res..

[14]  Shuai Li,et al.  Contextual Combinatorial Cascading Bandits , 2016, ICML.

[15]  Olivier Cappé,et al.  Multiple-Play Bandits in the Position-Based Model , 2016, NIPS.

[16]  Zheng Wen,et al.  Cascading Bandits for Large-Scale Recommendation Problems , 2016, UAI.

[17]  Zheng Wen,et al.  DCM Bandits: Learning to Rank with Multiple Clicks , 2016, ICML.

[18]  Zheng Wen,et al.  Combinatorial Cascading Bandits , 2015, NIPS.

[19]  Zheng Wen,et al.  Cascading Bandits: Learning to Rank in the Cascade Model , 2015, ICML.

[20]  Tie-Yan Liu,et al.  Learning to rank for information retrieval , 2009, SIGIR.

[21]  Tie-Yan Liu,et al.  Learning to rank: from pairwise approach to listwise approach , 2007, ICML '07.

[22]  Jacob D. Abernethy,et al.  Observation-Free Attacks on Stochastic Bandits , 2021, NeurIPS.

[23]  Akiko Takeda,et al.  Position-based Multiple-play Bandit Problem with Unknown Position Bias , 2017, NIPS.