Lower Bounds on Learning Decision Lists and Trees

k-Decision lists and decision trees play important roles in learning theory as well as in practical learning systems.k-Decision lists generalize classes such as monomials,k-DNF, andk-CNF, and like these subclasses they are polynomially PAC-learnable R. Rivest,Mach. Learning2(1987), 229?246]. This leaves open the question of whetherk-decision lists can be learned as efficiently ask-DNF. We answer this question negatively in a certain sense, thus disproving a claim in a popular textbook M. Anthony and N. Biggs, “Computational Learning Theory,” Cambridge Univ. Press, Cambridge, UK, 1992]. Decision trees, on the other hand, are not even known to be polynomially PAC-learnable, despite their widespread practical application. We will show that decision trees are not likely to be efficiently PAC-learnable. We summarize our specific results. The following problems cannot be approximated in polynomial time within a factor of 2log?nfor any? 0, unlessNP=P. Also,k-decision lists withl0?1 alternations cannot be approximated within a factor loglnunlessNP?DTIMEnO(loglogn)] (providing an interesting comparison to the upper bound obtained by A. Dhagat and L. Hellerstein in“FOCS '94,” pp. 64?74]).

[1]  Leslie G. Valiant,et al.  On the learnability of Boolean formulae , 1987, STOC.

[2]  David Haussler,et al.  Learning decision trees from random examples , 1988, COLT '88.

[3]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[4]  Martin Anthony,et al.  Computational Learning Theory , 1992 .

[5]  Mihalis Yannakakis,et al.  Optimization, Approximation, and Complexity Classes (Extended Abstract) , 1988, STOC 1988.

[6]  David S. Johnson,et al.  Computers and In stractability: A Guide to the Theory of NP-Completeness. W. H Freeman, San Fran , 1979 .

[7]  Leslie G. Valiant,et al.  Computational limitations on learning from examples , 1988, JACM.

[8]  Carsten Lund,et al.  Proof verification and hardness of approximation problems , 1992, Proceedings., 33rd Annual Symposium on Foundations of Computer Science.

[9]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[10]  Carsten Lund,et al.  Efficient probabilistically checkable proofs and applications to approximations , 1993, STOC.

[11]  Mihalis Yannakakis,et al.  Optimization, approximation, and complexity classes , 1991, STOC '88.

[12]  Ronald L. Rivest,et al.  Learning decision lists , 2004, Machine Learning.

[13]  Lisa Hellerstein,et al.  PAC learning with irrelevant attributes , 1994, Proceedings 35th Annual Symposium on Foundations of Computer Science.

[14]  Ronald L. Rivest,et al.  Constructing Optimal Binary Decision Trees is NP-Complete , 1976, Inf. Process. Lett..

[15]  David Haussler,et al.  Quantifying Inductive Bias: AI Learning Algorithms and Valiant's Learning Framework , 1988, Artif. Intell..

[16]  David Haussler,et al.  Learnability and the Vapnik-Chervonenkis dimension , 1989, JACM.

[17]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, CACM.

[18]  Ronald L. Rivest,et al.  Inferring Decision Trees Using the Minimum Description Length Principle , 1989, Inf. Comput..

[19]  Peter Auer,et al.  Theory and Applications of Agnostic PAC-Learning with Small Decision Trees , 1995, ICML.

[20]  Leonard Pitt,et al.  On the Necessity of Occam Algorithms , 1992, Theor. Comput. Sci..

[21]  Martin Anthony,et al.  Computational learning theory: an introduction , 1992 .

[22]  Carsten Lund,et al.  On the hardness of approximating minimization problems , 1993, STOC.