论文信息 - An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization

An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization

Bagging and boosting are methods that generate a diverse ensemble of classifiers by manipulating the training data given to a “base” learning algorithm. Breiman has pointed out that they rely for their effectiveness on the instability of the base learning algorithm. An alternative approach to generating an ensemble is to randomize the internal decisions made by the base algorithm. This general approach has been studied previously by Ali and Pazzani and by Dietterich and Kong. This paper compares the effectiveness of randomization, bagging, and boosting for improving the performance of the decision-tree algorithm C4.5. The experiments show that in situations with little or no classification noise, randomization is competitive with (and perhaps slightly superior to) bagging but not as accurate as boosting. In situations with substantial classification noise, bagging is much better than boosting, and sometimes better than randomization.

Thomas G. Dietterich

[1] Kamal A. Ali. A Comparison of Methods for Learning and Combining Evidence From Multiple Models , 1995 .

[2] Ron Kohavi,et al. Data Mining using MLC , 1996 .

[3] Yoav Freund,et al. Experiments with a New Boosting Algorithm , 1996, ICML.

[4] J. Ross Quinlan,et al. Bagging, Boosting, and C4.5 , 1996, AAAI/IAAI, Vol. 1.

[5] Leo Breiman,et al. Bias, Variance , And Arcing Classifiers , 1996 .

[6] L. Breiman. Heuristics of instability and stabilization in model selection , 1996 .

[7] David W. Opitz,et al. An Empirical Evaluation of Bagging and Boosting , 1997, AAAI/IAAI.

[8] Ron Kohavi,et al. Option Decision Trees with Majority Votes , 1997, ICML.

[9] Ron Kohavi,et al. Data Mining Using MLC a Machine Learning Library in C++ , 1996, Int. J. Artif. Intell. Tools.

[10] Thomas G. Dietterich,et al. Pruning Adaptive Boosting , 1997, ICML.

[11] Thomas G. Dietterich. Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms , 1998, Neural Computation.

[12] Catherine Blake,et al. UCI Repository of machine learning databases , 1998 .

[13] Michael J. Pazzani,et al. Error reduction through learning multiple descriptions , 2004, Machine Learning.

[14] Leo Breiman,et al. Bagging Predictors , 1996, Machine Learning.

[15] Eric Bauer,et al. An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants , 1999, Machine Learning.