Statistical queries and faulty PAC oracles

In this paper we study learning in the PAC model of Valiant [18] in which the example oracle used for learning may be faulty in one of two ways: either by misclassifying the example or by distorting the distribution of examples. We first consider models in which examples are misclassified. Kearns [12] recently showed that efficient learning in a new model using statistical queries is a sufficient condition for PAC learning with classification noise. We show that efficient learning with statistical queries is sufficient for learning in the PAC model with malicious error rate proportional to the required statistical query accuracy. One application of this result is a new lower bound for tolerable malicious error in learning monomials of k literals. This is the first such bound which is independent of the number of irrelevant attributes n. We also use the statistical query model to give sufficient conditions for using distribution specific algorithms on distributions outside their prescribed domains. A corollary of this result expands the class of distributions on which we can weakly learn monotone Boolean formulae. We also consider new models of learning in which examples are not chosen according to the distribution on which the learner will be tested. We examine three variations of distribution notse and give necessary and sufficient conditions for polynomial tzme learning with such noise. We show containment and separations between the various models of faulty oracles. Finally? we examine hypothesis boosting algorithms m the context of learning with distribution noise, and show that Schapire’s result re arding the strength of weak learnabil~f ity 17 is in some sense tight in requiring the wea learner to be nearly distribution free.

[1]  Michael Kearns,et al.  Computational complexity of machine learning , 1990, ACM distinguished dissertations.

[2]  Alon Itai,et al.  Learnability by fixed distributions , 1988, COLT '88.

[3]  Thomas R. Hancock,et al.  Learning 2u DNF formulas and ku decision trees , 1991, COLT 1991.

[4]  Peter L. Bartlett,et al.  Investigating the distribution assumptions in the Pac learning model , 1991, COLT '91.

[5]  Yoav Freund,et al.  An improved boosting algorithm and its implications on learning complexity , 1992, COLT '92.

[6]  Eyal Kushilevitz,et al.  Learning by distances , 1990, COLT '90.

[7]  Michael Kearns,et al.  Efficient noise-tolerant learning from statistical queries , 1993, STOC.

[8]  Ming Li,et al.  Learning in the presence of malicious errors , 1993, STOC '88.

[9]  David Haussler,et al.  Quantifying Inductive Bias: AI Learning Algorithms and Valiant's Learning Framework , 1988, Artif. Intell..

[10]  Peter L. Bartlett,et al.  Learning with a slowly changing distribution , 1992, COLT '92.

[11]  Leslie G. Valiant,et al.  On the learnability of Boolean formulae , 1987, STOC.

[12]  Thomas R. Hancock,et al.  Learning 2µ DNF Formulas and kµ Decision Trees , 1991, COLT.

[13]  Leonard Pitt,et al.  On the necessity of Occam algorithms , 1990, STOC '90.

[14]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[15]  Noam Nisan,et al.  Constant depth circuits, Fourier transform, and learnability , 1989, 30th Annual Symposium on Foundations of Computer Science.

[16]  Sean W. Smith,et al.  Improved learning of AC0 functions , 1991, COLT '91.

[17]  Alon Itai,et al.  Dominating distributions and learnability , 1992, COLT '92.