PAC learning with nasty noise

We introduce a new model for learning in the presence of noise, which we call the Nasty Noise model. This model generalizes previously considered models of learning with noise. The learning process in this model, which is a variant of the PAC model, proceeds as follows: Suppose that the learning algorithm during its execution asks for m examples. The examples that the algorithm gets are generated by a nasty adversary that works according to the following steps. First, the adversary chooses m examples (independently) according to a fixed (but unknown to the learning algorithm) distribution D as in the PAC-model. Then the powerful adversary, upon seeing the specific m examples that were chosen (and using his knowledge of the target function, the distribution D and the learning algorithm), is allowed to remove a fraction of the examples at its choice, and replace these examples by the same number of arbitrary examples of its choice; the m modified examples are then given to the learning algorithm. The only restriction on the adversary is that the number of examples that the adversary is allowed to modify should be distributed according to a binomial distribution with parameters η (the noise rate) and m.On the negative side, we prove that no algorithm can achieve accuracy of e > 2η in learning any non-trivial class of functions. We also give some lower bounds on the sample complexity required to achieve accuracy e = 2η + Δ. On the positive side, we show that a polynomial (in the usual parameters, and in 1/(e-2η)) number of examples suffice for learning any class of finite VC-dimension with accuracy e > 2η. This algorithm may not be efficient; however, we also show that a fairly wide family of concept classes can be efficiently learned in the presence of nasty noise.

[1]  Nicolò Cesa-Bianchi,et al.  Randomized Hypotheses and Minimum Disagreement Hypotheses for Learning with Noise , 1997, EuroCOLT.

[2]  Robert E. Schapire,et al.  Design and analysis of efficient learning algorithms , 1992, ACM Doctoral dissertation award ; 1991.

[3]  Donna K. Slonim,et al.  Learning with unreliable boundary queries , 1995, COLT '95.

[4]  Subhash Suri,et al.  Noise-tolerant distribution-free learning of general geometric concepts , 1996, STOC '96.

[5]  Shai Ben-David,et al.  A composition theorem for learning algorithms with applications to geometric concept classes , 1997, STOC '97.

[6]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[7]  Nader H. Bshouty,et al.  A new composition theorem for learning algorithms , 1998, STOC '98.

[8]  Rosario Gennaro,et al.  On learning from noisy and incomplete examples , 1995, COLT '95.

[9]  Scott E. Decatur Learning in Hybrid Noise Environments Using Statistical Queries , 1995, AISTATS.

[10]  Leslie G. Valiant,et al.  Learning Disjunction of Conjunctions , 1985, IJCAI.

[11]  Ming Li,et al.  Learning in the presence of malicious errors , 1993, STOC '88.

[12]  Javed A. Aslam,et al.  General bounds on statistical query learning and PAC learning with noise via hypothesis boosting , 1993, Proceedings of 1993 IEEE 34th Annual Foundations of Computer Science.

[13]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[14]  Scott E. Decatur PAC Learning with Constant-Partition Classification Noise and Applications to Decision Tree Induction , 1997, ICML.

[15]  Nicolò Cesa-Bianchi,et al.  Sample-efficient strategies for learning in the presence of noise , 1999, JACM.

[16]  Hans Ulrich Simon,et al.  Noise-tolerant learning near the information-theoretic bound , 1996, STOC '96.

[17]  S. M. Samuels,et al.  Monotone Convergence of Binomial Probabilities and a Generalization of Ramanujan's Equation , 1968 .

[18]  Yishay Mansour,et al.  Weakly learning DNF and characterizing statistical query learning using Fourier analysis , 1994, STOC '94.

[19]  R. Schapire,et al.  Toward efficient agnostic learning , 1992, COLT '92.

[20]  Norbert Sauer,et al.  On the Density of Families of Sets , 1972, J. Comb. Theory, Ser. A.

[21]  Shai Ben-David,et al.  Combinatorial Variability of Vapnik-chervonenkis Classes with Applications to Sample Compression Schemes , 1998, Discret. Appl. Math..

[22]  Michael Kearns,et al.  Efficient noise-tolerant learning from statistical queries , 1993, STOC.

[23]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[24]  M. Talagrand Sharper Bounds for Gaussian and Empirical Processes , 1994 .

[25]  DichtermanEli,et al.  Sample-efficient strategies for learning in the presence of noise , 1999 .

[26]  David Haussler,et al.  Learnability and the Vapnik-Chervonenkis dimension , 1989, JACM.

[27]  N. Fisher,et al.  Probability Inequalities for Sums of Bounded Random Variables , 1994 .

[28]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .