The Random Subspace Method for Constructing Decision Forests

Much of previous attention on decision trees focuses on the splitting criteria and optimization of tree sizes. The dilemma between overfitting and achieving maximum accuracy is seldom resolved. A method to construct a decision tree based classifier is proposed that maintains highest accuracy on training data and improves on generalization accuracy as it grows in complexity. The classifier consists of multiple trees constructed systematically by pseudorandomly selecting subsets of components of the feature vector, that is, trees constructed in randomly chosen subspaces. The subspace method is compared to single-tree classifiers and other forest construction methods by experiments on publicly available datasets, where the method's superiority is demonstrated. We also discuss independence between trees in a forest and relate that to the combined classification accuracy.

[1]  Nils J. Nilsson,et al.  Learning Machines: Foundations of Trainable Pattern-Classifying Systems , 1965 .

[2]  I. K. Sethi,et al.  Hierarchical Classifier Design Using Mutual Information , 1982, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  John Mingers,et al.  Expert Systems—Rule Induction with Statistical Data , 1987 .

[4]  Chris Carter,et al.  Multiple decision trees , 2013, UAI.

[5]  J. Sklansky,et al.  Automated design of multiple-class piecewise linear classifiers , 1989 .

[6]  Seymour Shlien,et al.  Multiple binary decision tree classifiers , 1990, Pattern Recognit..

[7]  Seymour Shlien Nonparametric classification using matched binary decision trees , 1992, Pattern Recognit. Lett..

[8]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[9]  Simon Kasif,et al.  Induction of Oblique Decision Trees , 1993, IJCAI.

[10]  Tin Kam Ho,et al.  Recognition of handwritten digits by combining independent learning vector quantizations , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).

[11]  Tin Kam Ho,et al.  Perfect metrics , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).

[12]  Sargur N. Srihari,et al.  Decision Combination in Multiple Classifier Systems , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[14]  Simon Kasif,et al.  A System for Induction of Oblique Decision Trees , 1994, J. Artif. Intell. Res..

[15]  R. Berlind An alternative method of stochastic discrimination with applications to pattern recognition , 1995 .

[16]  S. Salzberg,et al.  Chapter 18 Committees of decision trees , 1996 .

[17]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[18]  J. Ross Quinlan,et al.  Bagging, Boosting, and C4.5 , 1996, AAAI/IAAI, Vol. 1.

[19]  E. Kleinberg An overtraining-resistant stochastic modeling method for pattern recognition , 1996 .

[20]  Tin Kam Ho,et al.  Building projectable classifiers of arbitrary complexity , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[21]  Ishwar K. Sethi,et al.  Structure-driven induction of decision tree classifiers through neural learning , 1997, Pattern Recognit..

[22]  Yali Amit,et al.  Joint Induction of Shape Features and Tree Classifiers , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[23]  W. Grimson,et al.  Affine matching of planar sets , 1998 .

[24]  Tin Kam Ho,et al.  C4.5 decision forests , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[25]  Tin Kam Ho,et al.  Pattern Classification with Compact Distribution Maps , 1998, Comput. Vis. Image Underst..

[26]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.