论文信息 - On the Boosting Pruning Problem

On the Boosting Pruning Problem

Boosting is a powerful method for improving the predictive accuracy of classifiers. The ADABOOST algorithm of Freund and Schapire has been successfully applied to many domains [2, 10, 12] and the combination of ADABOOST with the C4.5 decision tree algorithm has been called the best off-the-shelf learning algorithm in practice. Unfortunately, in some applications, the number of decision trees required by ADABOOST to achieve a reasonable accuracy is enormously large and hence is very space consuming. This problem was first studied by Margineantu and Dietterich [7], where they proposed an empirical method called Kappa pruning to prune the boosting ensemble of decision trees. The Kappa method did this without sacrificing too much accuracy. In this work-in-progress we propose a potential improvement to the Kappa pruning method and also study the boosting pruning problem from a theoretical perspective. We point out that the boosting pruning problem is intractable even to approximate. Finally, we suggest a margin-based theoretical heuristic for this problem.

Christino Tamon | Jie Xiang | C. Tamon | Jie Xiang

[1] Yoav Freund,et al. Experiments with a New Boosting Algorithm , 1996, ICML.

[2] Yoav Freund,et al. A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[3] J. Ross Quinlan,et al. Bagging, Boosting, and C4.5 , 1996, AAAI/IAAI, Vol. 1.

[4] Yoav Freund,et al. Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[5] Viggo Kann,et al. Polynomially Bounded Minimization Problems That Are Hard to Approximate , 1993, Nord. J. Comput..

[6] Dorit S. Hochba,et al. Approximation Algorithms for NP-Hard Problems , 1997, SIGA.

[7] Alberto Maria Segre,et al. Programs for Machine Learning , 1994 .

[8] Yoram Singer,et al. Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[9] W. Hoeffding. Probability Inequalities for sums of Bounded Random Variables , 1963 .

[10] Catherine Blake,et al. UCI Repository of machine learning databases , 1998 .

[11] Thomas G. Dietterich,et al. Pruning Adaptive Boosting , 1997, ICML.

[12] Yoav Freund,et al. A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[13] David S. Johnson,et al. Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[14] Christopher J. Merz,et al. UCI Repository of Machine Learning Databases , 1996 .

[15] J. Ross Quinlan,et al. C4.5: Programs for Machine Learning , 1992 .

[16] Yoram Singer,et al. Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.