Learning Nested Differences of Intersection-Closed Concept Classes

This paper introduces a new framework for constructing learning algorithms. Our methods involve master algorithms which use learning algorithms for intersection-closed concept classes as subroutines. For example, we give a master algorithm capable of learning any concept class whose members can be expressed as nested differences (for example, c1 – (c2 – (c3 – (c4 – c5)))) of concepts from an intersection-closed class. We show that our algorithms are optimal or nearly optimal with respect to several different criteria. These criteria include: the number of examples needed to produce a good hypothesis with high confidence, the worst case total number of mistakes made, and the expected number of mistakes made in the first t trials.

[1]  Balas K. Natarajan,et al.  On learning Boolean functions , 1987, STOC.

[2]  Manfred K. Warmuth,et al.  Learning nested differences of intersection-closed concept classes , 2004, Machine Learning.

[3]  Leslie G. Valiant,et al.  Computational limitations on learning from examples , 1988, JACM.

[4]  Judea Pearl,et al.  ON THE CONNECTION BETWEEN THE COMPLEXITY AND CREDIBILITY OF INFERRED MODELS , 1978 .

[5]  Leonard Pitt,et al.  The minimum consistent DFA problem cannot be approximated within and polynomial , 1989, STOC '89.

[6]  N. Littlestone Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[7]  Leslie G. Valiant,et al.  A general lower bound on the number of examples needed for learning , 1988, COLT '88.

[8]  Nick Littlestone,et al.  From on-line to batch learning , 1989, COLT '89.

[9]  Leslie G. Valiant,et al.  On the learnability of Boolean formulae , 1987, STOC.

[10]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[11]  Leonard Pitt,et al.  Prediction-Preserving Reducibility , 1990, J. Comput. Syst. Sci..

[12]  David Haussler,et al.  Epsilon-nets and simplex range queries , 1986, SCG '86.

[13]  Ronald L. Rivest,et al.  Learning decision lists , 2004, Machine Learning.

[14]  David Haussler,et al.  Predicting (0, 1)-functions on randomly drawn points , 1988, [Proceedings 1988] 29th Annual Symposium on Foundations of Computer Science.

[15]  R. Dudley A course on empirical processes , 1984 .

[16]  David Haussler,et al.  Predicting {0,1}-functions on randomly drawn points , 1988, COLT '88.

[17]  David Haussler,et al.  ɛ-nets and simplex range queries , 1987, Discret. Comput. Geom..

[18]  Manfred K. Warmuth,et al.  Learning nested differences of intersection-closed concept classes , 2004, Machine Learning.

[19]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, CACM.

[20]  Manfred K. Warmuth,et al.  The weighted majority algorithm , 1989, 30th Annual Symposium on Foundations of Computer Science.

[21]  David Haussler,et al.  Learnability and the Vapnik-Chervonenkis dimension , 1989, JACM.

[22]  David Haussler,et al.  Equivalence of models for polynomial learnability , 1988, COLT '88.

[23]  Leonard Pitt,et al.  On the necessity of Occam algorithms , 1990, STOC '90.

[24]  Vladimir Vapnik,et al.  Estimation of Dependences Based on Empirical Data: Springer Series in Statistics (Springer Series in Statistics) , 1982 .