Better Algorithms for Stochastic Bandits with Adversarial Corruptions

We study the stochastic multi-armed bandits problem in the presence of adversarial corruption. We present a new algorithm for this problem whose regret is nearly optimal, substantially improving upon previous work. Our algorithm is agnostic to the level of adversarial contamination and can tolerate a significant amount of corruption with virtually no degradation in performance.

[1]  Daniel M. Kane,et al.  Robust Estimators in High Dimensions without the Computational Intractability , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[2]  Frederick R. Forst,et al.  On robust estimation of the location parameter , 1980 .

[3]  Gregory Valiant,et al.  Learning from untrusted data , 2016, STOC.

[4]  Gábor Lugosi,et al.  An Improved Parametrization and Analysis of the EXP3++ Algorithm for Stochastic and Adversarial Bandits , 2017, COLT.

[5]  Nicolò Cesa-Bianchi,et al.  Bandits With Heavy Tail , 2012, IEEE Transactions on Information Theory.

[6]  P. J. Huber Robust Estimation of a Location Parameter , 1964 .

[7]  Santosh S. Vempala,et al.  Agnostic Estimation of Mean and Covariance , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[8]  T. L. Lai Andherbertrobbins Asymptotically Efficient Adaptive Allocation Rules , 1985 .

[9]  Alan Malek,et al.  Best Arm Identification for Contaminated Bandits , 2018, J. Mach. Learn. Res..

[10]  Leslie G. Valiant,et al.  Learning Disjunction of Conjunctions , 1985, IJCAI.

[11]  Purushottam Kar,et al.  Corruption-tolerant bandit learning , 2018, Machine Learning.

[12]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[13]  Pravesh Kothari,et al.  Efficient Algorithms for Outlier-Robust Regression , 2018, COLT.

[14]  Michael R. Lyu,et al.  Almost Optimal Algorithms for Linear Stochastic Bandits with Heavy-Tailed Payoffs , 2018, NeurIPS.

[15]  Julian Zimmert,et al.  Beating Stochastic and Adversarial Semi-bandits Optimally and Simultaneously , 2019, ICML.

[16]  John Langford,et al.  Contextual Bandit Algorithms with Supervised Learning Guarantees , 2010, AISTATS.

[17]  Peter Auer,et al.  An algorithm with nearly optimal pseudo-regret for both stochastic and adversarial bandits , 2016, COLT.

[18]  Sébastien Bubeck,et al.  Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..

[19]  Aleksandrs Slivkins,et al.  25th Annual Conference on Learning Theory The Best of Both Worlds: Stochastic and Adversarial Bandits , 2022 .

[20]  Julian Zimmert,et al.  Tsallis-INF: An Optimal Algorithm for Stochastic and Adversarial Bandits , 2018, J. Mach. Learn. Res..

[21]  Ming Li,et al.  Learning in the presence of malicious errors , 1993, STOC '88.

[22]  Aleksandrs Slivkins,et al.  One Practical Algorithm for Both Stochastic and Adversarial Bandits , 2014, ICML.

[23]  Devdatt P. Dubhashi,et al.  Concentration of Measure for the Analysis of Randomized Algorithms: Contents , 2009 .

[24]  Shie Mannor,et al.  Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems , 2006, J. Mach. Learn. Res..

[25]  Peter Auer,et al.  The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[26]  Renato Paes Leme,et al.  Stochastic bandits robust to adversarial corruptions , 2018, STOC.

[27]  Michael R. Lyu,et al.  Pure Exploration of Multi-Armed Bandits with Heavy-Tailed Payoffs , 2018, UAI.

[28]  Daniel M. Kane,et al.  Learning geometric concepts with nasty noise , 2017, STOC.