The Strength of Weak Learnability

This paper addresses the problem of improving the accuracy of an hypothesis output by a learning algorithm in the distribution-free (PAC) learning model. A concept class is learnable (or strongly learnable) if, given access to a source of examples of the unknown concept, the learner with high probability is able to output an hypothesis that is correct on all but an arbitrarily small fraction of the instances. The concept class is weakly learnable if the learner can produce an hypothesis that performs only slightly better than random guessing. In this paper, it is shown that these two notions of learnability are equivalent.A method is described for converting a weak learning algorithm into one that achieves arbitrarily high accuracy. This construction may have practical applications as a tool for efficiently converting a mediocre learning algorithm into one that performs extremely well. In addition, the construction has some interesting theoretical consequences, including a set of general upper bounds on the complexity of any strong learning algorithm as a function of the allowed error ∈.

[1]  Michael Levison The application of the Ferranti Mercury computer to linguistic problems , 1960 .

[2]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[3]  Leslie G. Valiant,et al.  Fast probabilistic algorithms for hamiltonian circuits and matchings , 1977, STOC '77.

[4]  Temple F. Smith Occam's razor , 1980, Nature.

[5]  Dana Angluin,et al.  Finding Patterns Common to a Set of Strings , 1980, J. Comput. Syst. Sci..

[6]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[7]  Leslie G. Valiant,et al.  On the learnability of Boolean formulae , 1987, STOC.

[8]  David Haussler,et al.  Predicting (0, 1)-functions on randomly drawn points , 1988, [Proceedings 1988] 29th Annual Symposium on Foundations of Computer Science.

[9]  Leslie G. Valiant,et al.  Computational limitations on learning from examples , 1988, JACM.

[10]  David Haussler,et al.  Learning decision trees from random examples , 1988, COLT '88.

[11]  David Haussler,et al.  Predicting {0,1}-functions on randomly drawn points , 1988, COLT '88.

[12]  David Haussler,et al.  Equivalence of models for polynomial learnability , 1988, COLT '88.

[13]  Dana Angluin,et al.  Queries and concept learning , 1988, Machine Learning.

[14]  Jean Sallantin,et al.  Some remarks about space-complexity of learning, and circuit complexity of recognizing , 1988, Annual Conference Computational Learning Theory.

[15]  M. Kearns,et al.  Crytographic limitations on learning Boolean formulae and finite automata , 1989, STOC '89.

[16]  Manfred K. Warmuth,et al.  Learning Nested Differences of Intersection-Closed Concept Classes , 1989, COLT '89.

[17]  Sally Floyd,et al.  Space-bounded learning and the Vapnik-Chervonenkis dimension , 1989, COLT '89.

[18]  David Haussler,et al.  Learnability and the Vapnik-Chervonenkis dimension , 1989, JACM.

[19]  ERIC B. BAUM,et al.  On learning a union of half spaces , 1990, J. Complex..

[20]  Robert E. Schapire,et al.  Pattern languages are not learnable , 1990, Annual Conference Computational Learning Theory.

[21]  Leonard Pitt,et al.  On the necessity of Occam algorithms , 1990, STOC '90.

[22]  Michael Kearns,et al.  Computational complexity of machine learning , 1990, ACM distinguished dissertations.

[23]  Leslie G. Valiant,et al.  Cryptographic Limitations on Learning Boolean Formulae and Finite Automata , 1993, Machine Learning: From Theory to Applications.

[24]  Ronald L. Rivest,et al.  Learning decision lists , 2004, Machine Learning.