Diversified Random Forests Using Random Subspaces

Random Forest is an ensemble learning method used for classification and regression. In such an ensemble, multiple classifiers are used where each classifier casts one vote for its predicted class label. Majority voting is then used to determine the class label for unlabelled instances. Since it has been proven empirically that ensembles tend to yield better results when there is a significant diversity among the constituent models, many extensions were developed during the past decade that aim at inducing some diversity in the constituent models in order to improve the performance of Random Forests in terms of both speed and accuracy. In this paper, we propose a method to promote Random Forest diversity by using randomly selected subspaces, giving a weight to each subspace according to its predictive power, and using this weight in majority voting. Experimental study on 15 real datasets showed favourable results, demonstrating the potential of the proposed method.

[1]  Mohamed Medhat Gaber,et al.  GARF: Towards Self-optimised Random Forests , 2012, ICONIP.

[2]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[3]  Weizhong Yan,et al.  Designing classifier ensembles with constrained performance requirements , 2004, SPIE Defense + Commercial Sensing.

[4]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[5]  Laurent Heutte,et al.  A Study of Strength and Correlation in Random Forests , 2010, ICIC.

[6]  Marko Robnik-Sikonja,et al.  Improving Random Forests , 2004, ECML.

[7]  D. Opitz,et al.  Popular Ensemble Methods: An Empirical Study , 1999, J. Artif. Intell. Res..

[8]  R. Polikar,et al.  Ensemble based systems in decision making , 2006, IEEE Circuits and Systems Magazine.

[9]  Xin Yao,et al.  An analysis of diversity measures , 2006, Machine Learning.

[10]  Marko Robnik,et al.  Improving Random Forests , 2004 .

[11]  Dino Pedreschi,et al.  Machine Learning: ECML 2004 , 2004, Lecture Notes in Computer Science.

[12]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[14]  Gian Luca Foresti,et al.  Meta Random Forests , 2006 .

[15]  Nicolás García-Pedrajas,et al.  Boosting random subspace method , 2008, Neural Networks.

[16]  Yali Amit,et al.  Shape Quantization and Recognition with Randomized Trees , 1997, Neural Computation.

[17]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[18]  Mohamed Medhat Gaber,et al.  An Information-Theoretic Approach for Setting the Optimal Number of Decision Trees in Random Forests , 2013, 2013 IEEE International Conference on Systems, Man, and Cybernetics.

[19]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[20]  Saso Dzeroski,et al.  Combining Bagging and Random Subspaces to Create Better Ensembles , 2007, IDA.

[21]  Xin Yao,et al.  Diversity creation methods: a survey and categorisation , 2004, Inf. Fusion.

[22]  Rafael A. Calvo,et al.  Accuracy and Diversity in Ensembles of Text Categorisers , 2005, CLEI Electron. J..

[23]  Qu-Tang Cai,et al.  A weighted subspace approach for improving bagging performance , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[24]  David W. Opitz,et al.  Feature Selection for Ensembles , 1999, AAAI/IAAI.

[25]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[26]  Lior Rokach,et al.  Ensemble-based classifiers , 2010, Artificial Intelligence Review.

[27]  Ludmila I. Kuncheva,et al.  Measures of Diversity in Classifier Ensembles and Their Relationship with the Ensemble Accuracy , 2003, Machine Learning.