On Weak Learning

An algorithm is a weak learning algorithm if with some small probability it outputs a hypothesis with error slightly below 50%. This paper presents relationships between weak learning, weak prediction (where the probability of being correct is slightly larger than 50%), and consistency oracles (which decide whether or not a given set of examples is consistent with a concept in the class). Our main result is a simple polynomial prediction algorithm which makes only a single query to a consistency oracle and whose predictions have a polynomial edge over random guessing. We compare this prediction algorithm with several of the standard prediction techniques, deriving an improved worst case bound on Gibbs algorithm in the process. We use our algorithm to show that a concept class is polynomially learnable if and only if there is a polynomial probabilistic consistency oracle for the class. Since strong learning algorithms can be built from weak learning algorithms, our results also characterizes strong learnability.

[1]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[2]  Norbert Sauer,et al.  On the Density of Families of Sets , 1972, J. Comb. Theory A.

[3]  Alfred V. Aho,et al.  The Design and Analysis of Computer Algorithms , 1974 .

[4]  Temple F. Smith Occam's razor , 1980, Nature.

[5]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[6]  Noga Alon,et al.  Partitioning and geometric embedding of range spaces of finite Vapnik-Chervonenkis dimension , 1987, SCG '87.

[7]  David Haussler,et al.  Predicting {0,1}-functions on randomly drawn points , 1988, COLT '88.

[8]  David Haussler,et al.  Equivalence of models for polynomial learnability , 1988, COLT '88.

[9]  M. Kearns,et al.  Crytographic limitations on learning Boolean formulae and finite automata , 1989, STOC '89.

[10]  David Haussler,et al.  Learnability and the Vapnik-Chervonenkis dimension , 1989, JACM.

[11]  Yoav Freund,et al.  Boosting a weak learning algorithm by majority , 1995, COLT '90.

[12]  Robert E. Schapire,et al.  On the sample complexity of weak learning , 1990, COLT '90.

[13]  Leonard Pitt,et al.  On the necessity of Occam algorithms , 1990, STOC '90.

[14]  Vladimir Vovk,et al.  Aggregating strategies , 1990, COLT '90.

[15]  Sompolinsky,et al.  Learning from examples in large neural networks. , 1990, Physical review letters.

[16]  Haim Sompolinsky,et al.  Learning from Examples in a Single-Layer Neural Network , 1990 .

[17]  David Haussler,et al.  Bounds on the sample complexity of Bayesian learning using information theory and the VC dimension , 1991, COLT '91.

[18]  David Haussler,et al.  Calculation of the learning curve of Bayes optimal classification algorithm for learning a perceptron with noise , 1991, COLT '91.

[19]  Manfred K. Warmuth,et al.  Some weak learning results , 1992, COLT '92.

[20]  Leonard Pitt,et al.  On the Necessity of Occam Algorithms , 1992, Theor. Comput. Sci..

[21]  David Haussler,et al.  How to use expert advice , 1993, STOC.

[22]  Michael Kharitonov,et al.  Cryptographic hardness of distribution-specific learning , 1993, STOC.

[23]  Leslie G. Valiant,et al.  Cryptographic Limitations on Learning Boolean Formulae and Finite Automata , 1993, Machine Learning: From Theory to Applications.

[24]  O. Antoine,et al.  Theory of Error-correcting Codes , 2022 .