Improving Classification Accuracy based on Random Forest Model with Uncorrelated High Performing Trees

Random forest can achieve high classification performance through a classification ensemble with a set of decision trees that grow using randomly selected subspaces of data. The performance of an ensemble learner is highly dependent on the accuracy of each component learner and the diversity among these components. In random forest, randomization would cause occurrence of bad trees and may include correlated trees. This leads to inappropriate and poor ensemble classification decision. In this paper an attempt has been made to improve the performance of the model by including only uncorrelated high performing trees in a random forest. Experimental results have shown that, the random forest can be further enhanced in terms of the classification accuracy. General Terms Random forest, Classification Accuracy, Uncorrelated trees.

[1]  Yunming Ye,et al.  A Tree Selection Model for Improved Random Forest , 2011 .

[2]  Vincent Lepetit,et al.  Fast Keypoint Recognition in Ten Lines of Code , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Sinisa Pajevic,et al.  Short-term prediction of mortality in patients with systemic lupus erythematosus: classification of outcomes using random forests. , 2006, Arthritis and rheumatism.

[4]  Steve Horvath,et al.  Tumor classification by tissue microarray profiling: random forest clustering applied to renal cell carcinoma , 2005, Modern Pathology.

[5]  Peng Jiang,et al.  MiPred: classification of real and pseudo microRNA precursors using random forest prediction model with combined features , 2007, Nucleic Acids Res..

[6]  Manish Kumar,et al.  Forecasting Stock Index Movement: A Comparison of Support Vector Machines and Random Forest , 2006 .

[7]  Mahesh Pal,et al.  Random forest classifier for remote sensing classification , 2005 .

[8]  Pierre Geurts,et al.  Extremely randomized trees , 2006, Machine Learning.

[9]  Yali Amit,et al.  Shape Quantization and Recognition with Randomized Trees , 1997, Neural Computation.

[10]  Ramón Díaz-Uriarte,et al.  Gene selection and classification of microarray data using random forest , 2006, BMC Bioinformatics.

[11]  David S. Siroky Navigating Random Forests and related advances in algorithmic modeling , 2009 .

[12]  Laurent Heutte,et al.  On the selection of decision trees in Random Forests , 2009, 2009 International Joint Conference on Neural Networks.

[13]  Gian Luca Foresti,et al.  Ensembling Classifiers - An application to image data classification from Cherenkov telescope experiment , 2007, IEC.

[14]  Thomas G. Dietterich Machine-Learning Research , 1997, AI Mag..

[15]  L. Breiman Heuristics of instability and stabilization in model selection , 1996 .

[16]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[17]  E. Polley,et al.  Statistical Applications in Genetics and Molecular Biology Random Forests for Genetic Association Studies , 2011 .

[18]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[19]  Lawrence O. Hall,et al.  A Comparison of Decision Tree Ensemble Creation Techniques , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Paulo Cortez,et al.  Using data mining for bank direct marketing: an application of the CRISP-DM methodology , 2011 .

[21]  Robert C. Glen,et al.  Random Forest Models To Predict Aqueous Solubility , 2007, J. Chem. Inf. Model..

[22]  Thomas G. Dietterich Machine-Learning Research Four Current Directions , 1997 .

[23]  Thomas G. Dietterich An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization , 2000, Machine Learning.