Effects of Dynamic Subspacing in Random Forest

Due to its simplicity and good performance, Random Forest attains much interest from the research community. The splitting attribute at each node of a decision tree for Random Forest is determined from a predefined number of randomly selected attributes (a subset of the entire attribute set). The size of an attribute subset (subspace) is one of the most important factors that stems multitude of influences over Random Forest. In this paper, we propose a new technique that dynamically determines the size of subspaces based on the relative size of the current data segment to the entire data set. In order to assess the effects of the proposed technique, we conduct experiments involving five widely used data set from the UCI Machine Learning Repository. The experimental results indicate the capability of the proposed technique on improving the ensemble accuracy of Random Forest.

[1]  Md Zahidul Islam,et al.  Knowledge Discovery through SysFor - a Systematically Developed Forest of Multiple Decision Trees , 2011, AusDM.

[2]  Marko Robnik-Sikonja,et al.  Improving Random Forests , 2004, ECML.

[3]  Guoqiang Peter Zhang,et al.  Neural networks for classification: a survey , 2000, IEEE Trans. Syst. Man Cybern. Part C.

[4]  Lior Rokach,et al.  Data Mining And Knowledge Discovery Handbook , 2005 .

[5]  Md Zahidul Islam,et al.  One-vs-all binarization technique in the context of random forest , 2015, ESANN.

[6]  Laurent Heutte,et al.  Forest-RK: A New Random Forest Induction Method , 2008, ICIC.

[7]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[8]  Alex Alves Freitas,et al.  A Survey of Evolutionary Algorithms for Decision-Tree Induction , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[9]  Yunming Ye,et al.  Stratified sampling for feature subspace selection in random forests for high dimensional data , 2013, Pattern Recognit..

[10]  Sheng Liu,et al.  Combined Rule Extraction and Feature Elimination in Supervised Classification , 2012, IEEE Transactions on NanoBioscience.

[11]  Lukasz A. Kurgan,et al.  CAIM discretization algorithm , 2004, IEEE Transactions on Knowledge and Data Engineering.

[12]  Ludmila I. Kuncheva,et al.  Measures of Diversity in Classifier Ensembles and Their Relationship with the Ensemble Accuracy , 2003, Machine Learning.

[13]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[14]  Christopher J. C. Burges,et al.  A Tutorial on Support Vector Machines for Pattern Recognition , 1998, Data Mining and Knowledge Discovery.

[15]  Md Zahidul Islam,et al.  Complement Random Forest , 2015, AusDM.

[16]  Md. Nasim Adnan,et al.  Improving the random forest algorithm by randomly varying the size of the bootstrap samples , 2014, Proceedings of the 2014 IEEE 15th International Conference on Information Reuse and Integration (IEEE IRI 2014).

[17]  Bjoern H Menze,et al.  Multivariate feature selection and hierarchical classification for infrared spectroscopy: serum-based detection of bovine spongiform encephalopathy , 2007, Analytical and bioanalytical chemistry.

[18]  Nasim Adnan,et al.  ComboSplit: Combining Various Splitting Criteria for Building a Single Decision Tree , 2014 .

[19]  Md Zahidul Islam,et al.  A comprehensive method for attribute space extension for Random Forest , 2014, 2014 17th International Conference on Computer and Information Technology (ICCIT).

[20]  Mandeep Singh,et al.  A Review of Data Classification Using K-Nearest Neighbour Algorithm , 2013 .

[21]  Thomas G. Dietterich,et al.  Pruning Adaptive Boosting , 1997, ICML.

[22]  Pierre Geurts,et al.  Extremely randomized trees , 2006, Machine Learning.

[23]  Vipin Kumar,et al.  Introduction to Data Mining , 2022, Data Mining and Machine Learning Applications.

[24]  Sreerama K. Murthy,et al.  Automatic Construction of Decision Trees from Data: A Multi-Disciplinary Survey , 1998, Data Mining and Knowledge Discovery.

[25]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[26]  Xin Yao,et al.  An analysis of diversity measures , 2006, Machine Learning.

[27]  Marko Robnik,et al.  Improving Random Forests , 2004 .

[28]  Yvan Saeys,et al.  Robust Feature Selection Using Ensemble Feature Selection Techniques , 2008, ECML/PKDD.

[29]  Michael Y. Hu,et al.  Forecasting with artificial neural networks: The state of the art , 1997 .

[30]  Md Zahidul Islam,et al.  Forest CERN: A New Decision Forest Building Technique , 2016, PAKDD.

[31]  Ponnuthurai N. Suganthan,et al.  Random Forests with ensemble of feature spaces , 2014, Pattern Recognit..

[32]  Sotiris B. Kotsiantis,et al.  Decision trees: a recent overview , 2011, Artificial Intelligence Review.

[33]  Nicolás García-Pedrajas,et al.  Random feature weights for decision tree ensemble construction , 2012, Inf. Fusion.

[34]  Sylvain Arlot,et al.  A survey of cross-validation procedures for model selection , 2009, 0907.4728.

[35]  Philip J. Stone,et al.  Experiments in induction , 1966 .

[36]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[37]  Anil K. Jain,et al.  Artificial Neural Networks: A Tutorial , 1996, Computer.

[38]  Laurent Heutte,et al.  Dynamic Random Forests , 2012, Pattern Recognit. Lett..

[39]  Md Zahidul Islam,et al.  Optimizing the number of trees in a decision forest to discover a subforest with high ensemble accuracy using a genetic algorithm , 2016, Knowl. Based Syst..

[40]  Nasim Adnan,et al.  Knowledge discovery from a data set on dementia through decision forest , 2016 .

[41]  Steven L. Salzberg,et al.  On growing better decision trees from data , 1996 .

[42]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[43]  George C. Runger,et al.  Feature Selection with Ensembles, Artificial Variables, and Redundancy Elimination , 2009, J. Mach. Learn. Res..

[44]  Huiqing Liu,et al.  Ensembles of cascading trees , 2003, Third IEEE International Conference on Data Mining.

[45]  J. Ross Quinlan,et al.  Improved Use of Continuous Attributes in C4.5 , 1996, J. Artif. Intell. Res..

[46]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[47]  Mehmet Fatih Amasyali,et al.  Classifier Ensembles with the Extended Space Forest , 2014, IEEE Transactions on Knowledge and Data Engineering.

[48]  Juan José Rodríguez Diez,et al.  Rotation Forest: A New Classifier Ensemble Method , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[49]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[50]  Ludmila I. Kuncheva,et al.  Relationships between combination methods and measures of diversity in combining classifiers , 2002, Inf. Fusion.