Adversarial Online Learning with noise

We present and study models of adversarial online learning where the feedback observed by the learner is noisy, and the feedback is either full information feedback or bandit feedback. Specifically, we consider binary losses xored with the noise, which is a Bernoulli random variable. We consider both a constant noise rate and a variable noise rate. Our main results are tight regret bounds for learning with noise in the adversarial online learning model.

[1]  D. Angluin,et al.  Learning From Noisy Examples , 1988, Machine Learning.

[2]  Jean-Yves Audibert,et al.  Minimax Policies for Adversarial and Stochastic Bandits. , 2009, COLT 2009.

[3]  Manfred K. Warmuth,et al.  The Weighted Majority Algorithm , 1994, Inf. Comput..

[4]  Leslie G. Valiant,et al.  Learning Disjunction of Conjunctions , 1985, IJCAI.

[5]  Santosh S. Vempala,et al.  Efficient algorithms for online decision problems , 2005, J. Comput. Syst. Sci..

[6]  Shie Mannor,et al.  From Bandits to Experts: On the Value of Side-Observations , 2011, NIPS.

[7]  Ming Li,et al.  Learning in the presence of malicious errors , 1993, STOC '88.

[8]  Yifan Wu,et al.  Online Learning with Gaussian Payoffs and Side Observations , 2015, NIPS.

[9]  Noga Alon,et al.  Nonstochastic Multi-Armed Bandits with Graph-Structured Feedback , 2014, SIAM J. Comput..

[10]  Tsachy Weissman,et al.  Twofold universal prediction schemes for achieving the finite-state predictability of a noisy individual binary sequence , 2001, IEEE Trans. Inf. Theory.

[11]  Aleksandrs Slivkins,et al.  Introduction to Multi-Armed Bandits , 2019, Found. Trends Mach. Learn..

[12]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[13]  Peter Auer,et al.  The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[14]  Philip N. Klein,et al.  On the Number of Iterations for Dantzig-Wolfe Optimization and Packing-Covering Approximation Algorithms , 1999, SIAM J. Comput..

[15]  Sébastien Bubeck,et al.  Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..

[16]  Michal Valko,et al.  Online Learning with Noisy Side Observations , 2016, AISTATS.

[17]  Emilie Kaufmann,et al.  Corrupt Bandits for Preserving Local Privacy , 2017, ALT.

[18]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.