Adversarial Attacks on Linear Contextual Bandits

Contextual bandit algorithms are applied in a wide range of domains, from advertising to recommender systems, from clinical trials to education. In many of these domains, malicious agents may have incentives to attack the bandit algorithm to induce it to perform a desired behavior. For instance, an unscrupulous ad publisher may try to increase their own revenue at the expense of the advertisers; a seller may want to increase the exposure of their products, or thwart a competitor's advertising campaign. In this paper, we study several attack scenarios and show that a malicious agent can force a linear contextual bandit algorithm to pull any desired arm $T - o(T)$ times over a horizon of $T$ steps, while applying adversarial modifications to either rewards or contexts that only grow logarithmically as $O(\log T)$. We also investigate the case when a malicious agent is interested in affecting the behavior of the bandit algorithm in a single context (e.g., a specific user). We first provide sufficient conditions for the feasibility of the attack and we then propose an efficient algorithm to perform the attack. We validate our theoretical results on experiments performed on both synthetic and real-world datasets.

[1]  Hoang Tuy,et al.  D.C. Optimization: Theory, Methods and Algorithms , 1995 .

[2]  Matthieu Geist,et al.  Targeted Attacks on Deep Reinforcement Learning Agents through Adversarial Observations , 2019, ArXiv.

[3]  Bhaskar Mehta,et al.  Attack resistant collaborative filtering , 2008, SIGIR '08.

[4]  Purushottam Kar,et al.  Corruption-tolerant bandit learning , 2018, Machine Learning.

[5]  CARLOS A. GOMEZ-URIBE,et al.  The Netflix Recommender System , 2015, ACM Trans. Manag. Inf. Syst..

[6]  Kenneth Y. Goldberg,et al.  Eigentaste: A Constant Time Collaborative Filtering Algorithm , 2001, Information Retrieval.

[7]  Lei Ma,et al.  Stealthy and Efficient Adversarial Attacks against Deep Reinforcement Learning , 2020, AAAI.

[8]  Anupam Gupta,et al.  Better Algorithms for Stochastic Bandits with Adversarial Corruptions , 2019, COLT.

[9]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[10]  Jinhong Jung,et al.  A comparative study of matrix factorization and random walk with restart in recommender systems , 2017, 2017 IEEE International Conference on Big Data (Big Data).

[11]  Xiaojin Zhu,et al.  Policy Poisoning in Batch Reinforcement Learning and Control , 2019, NeurIPS.

[12]  Aleksandrs Slivkins,et al.  Corruption Robust Exploration in Episodic Reinforcement Learning , 2019, ArXiv.

[13]  Csaba Szepesvári,et al.  Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.

[14]  Shipra Agrawal,et al.  Thompson Sampling for Contextual Bandits with Linear Payoffs , 2012, ICML.

[15]  Chang Liu,et al.  Robust Linear Regression Against Training Data Poisoning , 2017, AISec@CCS.

[16]  Wei Chu,et al.  A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.

[17]  Lihong Li,et al.  Data Poisoning Attacks in Contextual Bandits , 2018, GameSec.

[18]  Yingbin Liang,et al.  Robust Stochastic Bandit Algorithms under Probabilistic Unbounded Adversarial Attack , 2020, AAAI.

[19]  David A. Wagner,et al.  Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples , 2018, ICML.

[20]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[21]  Yingkai Li,et al.  Stochastic Linear Optimization with Adversarial Corruption , 2019, ArXiv.

[22]  Nicole Immorlica,et al.  Adversarial Bandits with Knapsacks , 2018, 2019 IEEE 60th Annual Symposium on Foundations of Computer Science (FOCS).

[23]  J. Varah A lower bound for the smallest singular value of a matrix , 1975 .

[24]  Craig Boutilier,et al.  Data center cooling using model-predictive control , 2018, NeurIPS.

[25]  Yevgeniy Vorobeychik,et al.  Data Poisoning Attacks on Factorization-Based Collaborative Filtering , 2016, NIPS.

[26]  Stephen Boyd,et al.  A Rewriting System for Convex Optimization Problems , 2017, ArXiv.

[27]  Konstantina Christakopoulou,et al.  Adversarial attacks on an oblivious recommender , 2019, RecSys.

[28]  Chang Liu,et al.  Manipulating Machine Learning: Poisoning Attacks and Countermeasures for Regression Learning , 2018, 2018 IEEE Symposium on Security and Privacy (SP).

[29]  Ness Shroff,et al.  Data Poisoning Attacks on Stochastic Bandits , 2019, ICML.

[30]  Yu He,et al.  The YouTube video recommendation system , 2010, RecSys '10.

[31]  Lihong Li,et al.  Adversarial Attacks on Stochastic Bandits , 2018, NeurIPS.

[32]  Blaine Nelson,et al.  Poisoning Attacks against Support Vector Machines , 2012, ICML.

[33]  Renato Paes Leme,et al.  Stochastic bandits robust to adversarial corruptions , 2018, STOC.

[34]  Aleksandrs Slivkins,et al.  25th Annual Conference on Learning Theory The Best of Both Worlds: Stochastic and Adversarial Bandits , 2022 .

[35]  Matthieu Geist,et al.  CopyCAT: : Taking Control of Neural Policies with Constant Attacks , 2020, AAMAS.

[36]  F. Maxwell Harper,et al.  The MovieLens Datasets: History and Context , 2016, TIIS.

[37]  Joel Nothman,et al.  SciPy 1.0-Fundamental Algorithms for Scientific Computing in Python , 2019, ArXiv.

[38]  Alessandro Lazaric,et al.  Linear Thompson Sampling Revisited , 2016, International Conference on Artificial Intelligence and Statistics.