Stochastic bandits robust to adversarial corruptions

We introduce a new model of stochastic bandits with adversarial corruptions which aims to capture settings where most of the input follows a stochastic pattern but some fraction of it can be adversarially changed to trick the algorithm, e.g., click fraud, fake reviews and email spam. The goal of this model is to encourage the design of bandit algorithms that (i) work well in mixed adversarial and stochastic models, and (ii) whose performance deteriorates gracefully as we move from fully stochastic to fully adversarial models. In our model, the rewards for all arms are initially drawn from a distribution and are then altered by an adaptive adversary. We provide a simple algorithm whose performance gracefully degrades with the total corruption the adversary injected in the data, measured by the sum across rounds of the biggest alteration the adversary made in the data in that round; this total corruption is denoted by C. Our algorithm provides a guarantee that retains the optimal guarantee (up to a logarithmic term) if the input is stochastic and whose performance degrades linearly to the amount of corruption C, while crucially being agnostic to it. We also provide a lower bound showing that this linear degradation is necessary if the algorithm achieves optimal performance in the stochastic setting (the lower bound works even for a known amount of corruption, a special case in which our algorithm achieves optimal performance without the extra logarithm).

[1]  Emilie Kaufmann,et al.  Corrupt Bandits for Preserving Local Privacy , 2017, ALT.

[2]  Moshe Tennenholtz,et al.  Robust Probabilistic Inference , 2014, SODA.

[3]  T. L. Lai Andherbertrobbins Asymptotically Efficient Adaptive Allocation Rules , 2022 .

[4]  Vahab S. Mirrokni,et al.  Online Allocation with Traffic Spikes: Mixing Adversarial and Stochastic Models , 2015, EC.

[5]  Jean-Yves Audibert,et al.  Minimax Policies for Adversarial and Stochastic Bandits. , 2009, COLT 2009.

[6]  Ohad Shamir,et al.  Online Learning with Local Permutations and Delayed Feedback , 2017, ICML.

[7]  Daniel M. Kane,et al.  Robust Estimators in High Dimensions without the Computational Intractability , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[8]  Peter Auer,et al.  The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[9]  Aurélien Garivier,et al.  The KL-UCB Algorithm for Bounded Stochastic Bandits and Beyond , 2011, COLT.

[10]  Aleksandrs Slivkins,et al.  25th Annual Conference on Learning Theory The Best of Both Worlds: Stochastic and Adversarial Bandits , 2022 .

[11]  Morteza Zadimoghaddam,et al.  Simultaneous approximations for adversarial and stochastic online budgeted allocation , 2012, SODA.

[12]  Emilie Kaufmann,et al.  Corrupt Bandits for Privacy Preserving Input , 2017, ArXiv.

[13]  Haipeng Luo,et al.  More Adaptive Algorithms for Adversarial Bandits , 2018, COLT.

[14]  Shie Mannor,et al.  Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems , 2006, J. Mach. Learn. Res..

[15]  Gábor Lugosi,et al.  An Improved Parametrization and Analysis of the EXP3++ Algorithm for Stochastic and Adversarial Bandits , 2017, COLT.

[16]  Aleksandrs Slivkins,et al.  Introduction to Multi-Armed Bandits , 2019, Found. Trends Mach. Learn..

[17]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[18]  Aleksandrs Slivkins,et al.  One Practical Algorithm for Both Stochastic and Adversarial Bandits , 2014, ICML.

[19]  Yang Cai,et al.  Learning Multi-Item Auctions with (or without) Samples , 2017, 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS).

[20]  Éva Tardos,et al.  Learning in Games: Robustness of Fast Convergence , 2016, NIPS.

[21]  Elad Hazan,et al.  Better Algorithms for Benign Bandits , 2009, J. Mach. Learn. Res..

[22]  John Langford,et al.  Contextual Bandit Algorithms with Supervised Learning Guarantees , 2010, AISTATS.

[23]  Peter Auer,et al.  An algorithm with nearly optimal pseudo-regret for both stochastic and adversarial bandits , 2016, COLT.

[24]  Sébastien Bubeck,et al.  Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..

[25]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[26]  Nicolò Cesa-Bianchi,et al.  Bandits With Heavy Tail , 2012, IEEE Transactions on Information Theory.