Predicting nearly as well as the best pruning of a decision tree through dynamic programming scheme

Helmbold and Schapire gave an on-line prediction algorithm that, when given an unpruned decision tree, produces predictions not much worse than the predictions made by the best pruning of the given decision tree. In this paper, we give two new on-line algorithms. The first algorithm is based on the observation that finding the best pruning can be efficiently solved by a dynamic programming in the "batch" setting where all the data to be predicted are given in advance. This algorithm works well for a wide class of loss functions, whereas the one given by Helmbold and Schapire is only described for the absolute loss function. Moreover, the algorithm given in this paper is so simple and general that it could be applied to many other on-line optimization problems solved by dynamic programming. We also explore the second algorithm that is competitive not only with the best pruning but also with the best prediction values which are associated with nodes in the decision tree. In this setting, a greatly simplified algorithm is given for the absolute loss function. It can be easily generalized to the case where, instead of using decision trees, data are classified in some arbitrarily fixed manner.

[1]  Yoram Singer,et al.  Using and combining predictors that specialize , 1997, STOC '97.

[2]  Robert E. Schapire,et al.  Predicting Nearly as Well as the Best Pruning of a Decision Tree , 1995, COLT.

[3]  Eiji Takimoto,et al.  A Simple Algorithm for Predicting Nearly as Well as the Best Pruning Labeled with the Best Prediction Values of a Decision Tree , 1997, ALT.

[4]  Nader H. Bshouty,et al.  Exact learning via the Monotone theory , 1993, Proceedings of 1993 IEEE 34th Annual Foundations of Computer Science.

[5]  Yishay Mansour,et al.  On the boosting ability of top-down decision tree learning algorithms , 1996, STOC '96.

[6]  Alfredo De Santis,et al.  Learning probabilistic prediction functions , 1988, [Proceedings 1988] 29th Annual Symposium on Foundations of Computer Science.

[7]  David Haussler,et al.  Tight worst-case loss bounds for predicting with expert advice , 1994, EuroCOLT.

[8]  Yoram Singer,et al.  An efficient extension to mixture techniques for prediction and decision trees , 1997, COLT '97.

[9]  Nader H. Bshouty Exact Learning Boolean Function via the Monotone Theory , 1995, Inf. Comput..

[10]  Vladimir Vovk,et al.  Universal Forecasting Algorithms , 1992, Inf. Comput..

[11]  Yishay Mansour,et al.  Weakly learning DNF and characterizing statistical query learning using Fourier analysis , 1994, STOC '94.

[12]  Vladimir Vovk,et al.  Derandomizing Stochastic Prediction Strategies , 1997, COLT '97.

[13]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[14]  Erik Ordentlich,et al.  Universal portfolios with side information , 1996, IEEE Trans. Inf. Theory.

[15]  Manfred K. Warmuth,et al.  The Weighted Majority Algorithm , 1994, Inf. Comput..

[16]  Vladimir Vovk,et al.  Aggregating strategies , 1990, COLT '90.

[17]  Vladimir Vovk,et al.  A game of prediction with expert advice , 1995, COLT '95.

[18]  N. Littlestone Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[19]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.