Limiting the Number of Trees in Random Forests

The aim of this paper is to propose a simple procedure that a priori determines a minimum number of classifiers to combine in order to obtain a prediction accuracy level similar to the one obtained with the combination of larger ensembles. The procedure is based on the McNemar non-parametric test of significance. Knowing a priori the minimum size of the classifier ensemble giving the best prediction accuracy, constitutes a gain for time and memory costs especially for huge data bases and real-time applications. Here we applied this procedure to four multiple classifier systems with C4.5 decision tree (Breiman's Bagging, Ho's Random subspaces, their combination we labeled 'Bagfs', and Breiman's Random forests) and five large benchmark data bases. It is worth noticing that the proposed procedure may easily be extended to other base learning algorithms than a decision tree as well. The experimental results showed that it is possible to limit significantly the number of trees. We also showed that the minimum number of trees required for obtaining the best prediction accuracy may vary from one classifier combination method to another.

[1]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[2]  Zijian Zheng,et al.  Generating Classifier Commitees by Stochastically Selecting both Attributes and Training Examples , 1998, PRICAI.

[3]  S. Siegel,et al.  Nonparametric Statistics for the Behavioral Sciences , 2022, The SAGE Encyclopedia of Research Design.

[4]  Olivier Debeir,et al.  Different Ways of Weakening Decision Trees and Their Impact on Classification Accuracy of DT Combination , 2000, Multiple Classifier Systems.

[5]  Thomas G. Dietterich Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms , 1998, Neural Computation.

[6]  G DietterichThomas An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees , 2000 .

[7]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[8]  Bernard R. Rosner,et al.  Fundamentals of Biostatistics. , 1992 .

[9]  Stephen D. Bay Combining Nearest Neighbor Classifiers Through Multiple Feature Subsets , 1998, ICML.

[10]  J. Ross Quinlan,et al.  Bagging, Boosting, and C4.5 , 1996, AAAI/IAAI, Vol. 1.

[11]  Ron Kohavi,et al.  Option Decision Trees with Majority Votes , 1997, ICML.

[12]  Olivier Debeir,et al.  Mixing Bagging and Multiple Feature Subsets to Improve Classification Accuracy of Decision Tree Combination , 2000 .

[13]  L. Breiman Arcing classifier (with discussion and a rejoinder by the author) , 1998 .

[14]  L. Breiman Random Forests--random Features , 1999 .

[15]  Chuanyi Ji,et al.  Combinations of Weak Classifiers , 1996, NIPS.

[16]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[17]  Kagan Tumer,et al.  Classifier Combining: Analytical Results and Implications , 1995 .

[18]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[19]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[20]  Stephen D. Bay Nearest neighbor classification from multiple feature subsets , 1999, Intell. Data Anal..

[21]  L. Breiman Arcing Classifiers , 1998 .

[22]  Fabio Roli,et al.  An approach to the automatic design of multiple classifier systems , 2001, Pattern Recognit. Lett..