Learning with stochastic inputs and adversarial outputs

Most of the research in online learning is focused either on the problem of adversarial classification (i.e., both inputs and labels are arbitrarily chosen by an adversary) or on the traditional supervised learning problem in which samples are independent and identically distributed according to a stationary probability distribution. Nonetheless, in a number of domains the relationship between inputs and outputs may be adversarial, whereas input instances are i.i.d. from a stationary distribution (e.g., user preferences). This scenario can be formalized as a learning problem with stochastic inputs and adversarial outputs. In this paper, we introduce this novel stochastic-adversarial learning setting and we analyze its learnability. In particular, we show that in a binary classification problem over a horizon of n rounds, given a hypothesis space H with finite VC-dimension, it is possible to design an algorithm that incrementally builds a suitable finite set of hypotheses from H used as input for an exponentially weighted forecaster and achieves a cumulative regret of order O(nVC(H)logn) with overwhelming probability. This result shows that whenever inputs are i.i.d. it is possible to solve any binary classification problem using a finite VC-dimension hypothesis space with a sub-linear regret independently from the way labels are generated (either stochastic or adversarial). We also discuss extensions to multi-class classification, regression, learning from experts and bandit settings with stochastic side information, and application to games.

[1]  Ambuj Tewari,et al.  Efficient bandit algorithms for online multiclass prediction , 2008, ICML '08.

[2]  Gábor Lugosi,et al.  Introduction to Statistical Learning Theory , 2004, Advanced Lectures on Machine Learning.

[3]  Claudio Gentile,et al.  Robust bounds for classification via selective sampling , 2009, ICML '09.

[4]  N. Littlestone Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[5]  Balas K. Natarajan,et al.  On learning sets and functions , 2004, Machine Learning.

[6]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[7]  J. Langford,et al.  The Epoch-Greedy algorithm for contextual multi-armed bandits , 2007, NIPS 2007.

[8]  J. Lamperti ON CONVERGENCE OF STOCHASTIC PROCESSES , 1962 .

[9]  David Haussler,et al.  How to use expert advice , 1993, STOC.

[10]  Peter Auer,et al.  Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..

[11]  Koby Crammer,et al.  Ultraconservative Online Algorithms for Multiclass Problems , 2001, J. Mach. Learn. Res..

[12]  Matti Kääriäinen,et al.  Generalization Error Bounds Using Unlabeled Data , 2005, COLT.

[13]  John Langford,et al.  An Optimal High Probability Algorithm for the Contextual Bandit Problem , 2010, ArXiv.

[14]  B. Natarajan On learning sets and functions , 2004, Machine Learning.

[15]  Daniil Ryabko,et al.  Pattern Recognition for Conditionally Independent Data , 2005, J. Mach. Learn. Res..

[16]  Noga Alon,et al.  Scale-sensitive dimensions, uniform convergence, and learnability , 1997, JACM.

[17]  Shang-Hua Teng,et al.  Smoothed analysis of algorithms: why the simplex algorithm usually takes polynomial time , 2001, STOC '01.

[18]  Norbert Sauer,et al.  On the Density of Families of Sets , 1972, J. Comb. Theory A.

[19]  Gábor Lugosi,et al.  Learning correlated equilibria in games with compact sets of strategies , 2007, Games Econ. Behav..

[20]  Philip M. Long,et al.  Characterizations of Learnability for Classes of {0, ..., n}-Valued Functions , 1995, J. Comput. Syst. Sci..

[21]  Shai Ben-David,et al.  Characterizations of learnability for classes of {O, …, n}-valued functions , 1992, COLT '92.

[22]  Shai Ben-David,et al.  Agnostic Online Learning , 2009, COLT.

[23]  Adam Krzyzak,et al.  A Distribution-Free Theory of Nonparametric Regression , 2002, Springer series in statistics.

[24]  Manfred K. Warmuth,et al.  The Weighted Majority Algorithm , 1994, Inf. Comput..

[25]  Vladimir Vovk,et al.  A game of prediction with expert advice , 1995, COLT '95.

[26]  Jason Weston,et al.  Support vector machines for multi-class pattern recognition , 1999, ESANN.

[27]  Adam Tauman Kalai,et al.  From Batch to Transductive Online Learning , 2005, NIPS.

[28]  Yoram Singer,et al.  Online multiclass learning by interclass hypothesis sharing , 2006, ICML.

[29]  David Haussler,et al.  Decision Theoretic Generalizations of the PAC Model for Neural Net and Other Learning Applications , 1992, Inf. Comput..

[30]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[31]  Ambuj Tewari,et al.  Online Learning: Random Averages, Combinatorial Parameters, and Learnability , 2010, NIPS.

[32]  Peter Auer,et al.  The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[33]  Peter L. Bartlett,et al.  A Stochastic View of Optimal Regret through Minimax Duality , 2009, COLT.