Efficient noise-tolerant learning from statistical queries

In this paper, we study the problem of learning in the presence of classification noise in the probabilistic learning model of Valiant and its variants. In order to identify the class of “robust” learning algorithms in the most general way, we formalize a new but related model of learning from statistical queries. Intuitively, in this model, a learning algorithm is forbidden to examine individual examples of the unknown target function, but is given access to an oracle providing estimates of probabilities over the sample space of random examples. One of our main results shows that any class of functions learnable from statistical queries is in fact learnable with classification noise in Valiant’s model, with a noise rate approaching the informationtheoretic barrier of 1/2. We then demonstrate the generality of the statistical query model, showing that practically every class learnable in Valiant’s model and its variants can also be learned in the new model (and thus can be learned in the presence of noise). A notable exception to this statement is the class of parity functions, which we prove is not learnable from statistical queries, and for which no noise-tolerant algorithm is known.

[1]  Marvin Minsky,et al.  Perceptrons: An Introduction to Computational Geometry , 1969 .

[2]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[3]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[4]  Leslie G. Valiant,et al.  Learning Disjunction of Conjunctions , 1985, IJCAI.

[5]  Ronald L. Rivest,et al.  Learning decision lists , 2004, Machine Learning.

[6]  Leslie G. Valiant,et al.  On the learnability of Boolean formulae , 1987, STOC.

[7]  Marvin Minsky,et al.  Perceptrons: An Introduction to Computational Geometry, Expanded Edition , 1987 .

[8]  Ming Li,et al.  Learning in the presence of malicious errors , 1993, STOC '88.

[9]  David Haussler,et al.  Quantifying Inductive Bias: AI Learning Algorithms and Valiant's Learning Framework , 1988, Artif. Intell..

[10]  P. Laird Learning from Good and Bad Data , 1988 .

[11]  Leslie G. Valiant,et al.  Computational limitations on learning from examples , 1988, JACM.

[12]  Leslie G. Valiant,et al.  A general lower bound on the number of examples needed for learning , 1988, COLT '88.

[13]  Marvin Minsky,et al.  Perceptrons: expanded edition , 1988 .

[14]  Noam Nisan,et al.  Constant depth circuits, Fourier transform, and learnability , 1989, 30th Annual Symposium on Foundations of Computer Science.

[15]  E. Gardner,et al.  Three unfinished works on the optimal storage capacity of networks , 1989 .

[16]  David Haussler,et al.  Learnability and the Vapnik-Chervonenkis dimension , 1989, JACM.

[17]  Leonard Pitt,et al.  A polynomial-time algorithm for learning k-variable pattern languages from examples , 1989, COLT '89.

[18]  Manfred K. Warmuth,et al.  Learning integer lattices , 1990, COLT '90.

[19]  Michael Kearns,et al.  Computational complexity of machine learning , 1990, ACM distinguished dissertations.

[20]  Robert E. Schapire,et al.  Efficient distribution-free learning of probabilistic concepts , 1990, Proceedings [1990] 31st Annual Symposium on Foundations of Computer Science.

[21]  Hans Ulrich Simon,et al.  On learning ring-sum-expansions , 1990, COLT '90.

[22]  Robert E. Schapire,et al.  Learning probabilistic read-once formulas on product distributions , 1991, COLT '91.

[23]  Yishay Mansour,et al.  Learning monotone ku DNF formulas on product distributions , 1991, COLT '91.

[24]  榊原 康文,et al.  Algorithmic learning of formal languages and decision trees , 1991 .

[25]  Sean W. Smith,et al.  Improved learning of AC0 functions , 1991, COLT '91.

[26]  Yuh-Dauh Lyuu,et al.  The Transition to Perfect Generalization in Perceptrons , 1991, Neural Computation.

[27]  Sompolinsky,et al.  Statistical mechanics of learning from examples. , 1992, Physical review. A, Atomic, molecular, and optical physics.

[28]  Robert H. Sloan,et al.  Corrigendum to types of noise in data for concept learning , 1988, COLT '92.

[29]  Robert E. Schapire,et al.  Design and analysis of efficient learning algorithms , 1992, ACM Doctoral dissertation award ; 1991.

[30]  Javed A. Aslam,et al.  General bounds on statistical query learning and PAC learning with noise via hypothesis boosting , 1993, Proceedings of 1993 IEEE 34th Annual Foundations of Computer Science.

[31]  Yishay Mansour,et al.  Weakly learning DNF and characterizing statistical query learning using Fourier analysis , 1994, STOC '94.

[32]  Umesh V. Vazirani,et al.  An Introduction to Computational Learning Theory , 1994 .

[33]  Javed A. Aslam,et al.  Specification and simulation of statistical query algorithms for efficiency and noise tolerance , 1995, COLT '95.