Top-down decision tree learning as information based boosting

We consider a boosting technique that can be directly applied to multiclass classification problems. Although many boosting algorithms have been proposed so far, most of them are developed essentially for binary classification problems, and in order to handle multiclass classification problems, they need to be reduced somehow to binary ones. In order to avoid such reductions, we introduce a notion of the pseudo-entropy function G that gives an information-theoretic criterion, called the conditional G-entropy, for measuring the loss of hypotheses. The conditional G-entropy turns out to be useful for defining the weakness of hypotheses that approximate, in some way, a multiclass function in general, so that we can consider the boosting problem without reduction. We show that the top-down decision tree learning algorithm using the conditional G-entropy as its splitting criterion is an efficient boosting algorithm. Namely, the algorithm intends to minimize the conditional G-entropy, rather than the classification error. In the binary case, our algorithm turns out to be identical to the error-based boosting algorithm proposed by Kearns and Mansour, and our analysis gives a simpler proof of their results.

[1]  Yishay Mansour,et al.  Boosting Using Branching Programs , 2000, J. Comput. Syst. Sci..

[2]  Y. Freund,et al.  Discussion of the Paper \additive Logistic Regression: a Statistical View of Boosting" By , 2000 .

[3]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[4]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[5]  Aiko M. Hormann,et al.  Programs for Machine Learning. Part I , 1962, Inf. Control..

[6]  Yoav Freund,et al.  Boosting a weak learning algorithm by majority , 1990, COLT '90.

[7]  Eiji Takimoto,et al.  Mutual Information Gaining Algorithm and Its Relation to PAC-Learning Algorithm , 1994, AII/ALT.

[8]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[9]  Yishay Mansour,et al.  On the Boosting Ability of Top-Down Decision Tree Learning Algorithms , 1999, J. Comput. Syst. Sci..

[10]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[11]  R. Schapire The Strength of Weak Learnability , 1990, Machine Learning.

[12]  Yoav Freund,et al.  Game theory, on-line prediction and boosting , 1996, COLT '96.

[13]  B. Natarajan Machine Learning: A Theoretical Approach , 1992 .

[14]  Javed A. Aslam Improving Algorithms for Boosting , 2000, COLT.

[15]  Robert E. Schapire,et al.  The strength of weak learnability , 1990, Mach. Learn..