Best Arm Identification for Contaminated Bandits

This paper studies active learning in the context of robust statistics. Specifically, we propose a variant of the Best Arm Identification problem for \emph{contaminated bandits}, where each arm pull has probability $\varepsilon$ of generating a sample from an arbitrary contamination distribution instead of the true underlying distribution. The goal is to identify the best (or approximately best) true distribution with high probability, with a secondary goal of providing guarantees on the quality of this distribution. The primary challenge of the contaminated bandit setting is that the true distributions are only partially identifiable, even with infinite samples. To address this, we develop tight, non-asymptotic sample complexity bounds for high-probability estimation of the first two robust moments (median and median absolute deviation) from contaminated samples. These concentration inequalities are the main technical contributions of the paper and may be of independent interest. Using these results, we adapt several classical Best Arm Identification algorithms to the contaminated bandit setting and derive sample complexity upper bounds for our problem. Finally, we provide matching information-theoretic lower bounds on the sample complexity (up to a small logarithmic factor).

[1]  Charles F. Manski,et al.  Identification for Prediction and Decision , 2008 .

[2]  Prateek Jain,et al.  Thresholding based Efficient Outlier Robust PCA , 2017, ArXiv.

[3]  Robert E. Bechhofer,et al.  Sequential identification and ranking procedures : with special reference to Koopman-Darmois populations , 1970 .

[4]  Oren Somekh,et al.  Almost Optimal Exploration in Multi-Armed Bandits , 2013, ICML.

[5]  Shie Mannor,et al.  PAC Bounds for Multi-armed Bandit and Markov Decision Processes , 2002, COLT.

[6]  Csaba Szepesvári,et al.  Partial Monitoring - Classification, Regret Bounds, and Algorithms , 2014, Math. Oper. Res..

[7]  F. Hampel The Influence Curve and Its Role in Robust Estimation , 1974 .

[8]  Shie Mannor,et al.  Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems , 2006, J. Mach. Learn. Res..

[9]  Thierry Magnac,et al.  Set Identified Linear Models , 2011 .

[10]  Aleksandrs Slivkins,et al.  One Practical Algorithm for Both Stochastic and Adversarial Bandits , 2014, ICML.

[11]  Jian Li,et al.  On the Optimal Sample Complexity for Best Arm Identification , 2015, ArXiv.

[12]  Nicolò Cesa-Bianchi,et al.  Bandits With Heavy Tail , 2012, IEEE Transactions on Information Theory.

[13]  Joel L. Horowitz,et al.  Identification and Robustness with Contaminated and Corrupted Data , 1995 .

[14]  John N. Tsitsiklis,et al.  The Sample Complexity of Exploration in the Multi-Armed Bandit Problem , 2004, J. Mach. Learn. Res..

[15]  Alessandro Lazaric,et al.  Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence , 2012, NIPS.

[16]  Azeem M. Shaikh,et al.  Inference for the identified set in partially identified econometric models , 2006 .

[17]  Leslie G. Valiant,et al.  Learning Disjunction of Conjunctions , 1985, IJCAI.

[18]  Matthew Malloy,et al.  On Finding the Largest Mean Among Many , 2013, ArXiv.

[19]  Constance F. Citro,et al.  Principles and practices for a federal statistical agency , 2005 .

[20]  Sébastien Bubeck,et al.  Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..

[21]  Jerry Li,et al.  Robustly Learning a Gaussian: Getting Optimal Error, Efficiently , 2017, SODA.

[22]  Daniel M. Kane,et al.  Robust Estimators in High Dimensions without the Computational Intractability , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[23]  V. Yohai,et al.  Robust Estimation of Multivariate Location and Scatter , 2006 .

[24]  Peter L. Bartlett,et al.  Neural Network Learning - Theoretical Foundations , 1999 .

[25]  S L Warner,et al.  Randomized response: a survey technique for eliminating evasive answer bias. , 1965, Journal of the American Statistical Association.

[26]  Santosh S. Vempala,et al.  Agnostic Estimation of Mean and Covariance , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[27]  Ameet Talwalkar,et al.  Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization , 2016, J. Mach. Learn. Res..

[28]  Raphaël Féraud,et al.  The non-stationary stochastic multi-armed bandit problem , 2017, International Journal of Data Science and Analytics.

[29]  Ming Li,et al.  Learning in the presence of malicious errors , 1993, STOC '88.

[30]  W. H. Andrews Random Simultaneous Equations and the Theory of Production , 1944 .

[31]  Aurélien Garivier,et al.  On the Complexity of Best-Arm Identification in Multi-Armed Bandit Models , 2014, J. Mach. Learn. Res..

[32]  P. J. Huber Robust Estimation of a Location Parameter , 1964 .

[33]  D. Ruppert Robust Statistics: The Approach Based on Influence Functions , 1987 .

[34]  H. Chernoff Sequential Analysis and Optimal Design , 1987 .

[35]  Lihong Li,et al.  Adversarial Attacks on Stochastic Bandits , 2018, NeurIPS.

[36]  Ameet Talwalkar,et al.  Non-stochastic Best Arm Identification and Hyperparameter Optimization , 2015, AISTATS.

[37]  Matthew Malloy,et al.  lil' UCB : An Optimal Exploration Algorithm for Multi-Armed Bandits , 2013, COLT.

[38]  Frederick R. Forst,et al.  On robust estimation of the location parameter , 1980 .

[39]  Gregory Valiant,et al.  Learning from untrusted data , 2016, STOC.

[40]  Noga Alon,et al.  Nonstochastic Multi-Armed Bandits with Graph-Structured Feedback , 2014, SIAM J. Comput..

[41]  Raphaël Féraud,et al.  Selection of learning experts , 2017, 2017 International Joint Conference on Neural Networks (IJCNN).

[42]  H. Robbins,et al.  Asymptotically efficient adaptive allocation rules , 1985 .

[43]  Peter J. Rousseeuw,et al.  Robust regression and outlier detection , 1987 .

[44]  Aurélien Garivier,et al.  Optimal Best Arm Identification with Fixed Confidence , 2016, COLT.

[45]  Ambuj Tewari,et al.  PAC Subset Selection in Stochastic Multi-armed Bandits , 2012, ICML.