An Experimental Assessment of Random Forest Classification Performance Improvisation with Sampling and Stage Wise Success Rate Calculation

Abstract Imbalanced data classification with Random Forest Classification (RFC) technique has gained huge prominence in today’s application era. Data imbalance between practical applications relates to either binary class imbalance or multiclass imbalance. Binary class imbalance constitutes one of the classes with majority data samples and other contains minor number of data samples. In case of multiclass there are two categories of multiclass imbalanced dataset as Multiclass Minority Imbalanced Class (MMinIC) and Multiclass Majority Imbalanced Class (MMajIC). Classification performance leans towards degradation for MMajIC than MMinIC due to major imbalance rate severity. In this paper, the study investigates the influence of RFC classification analysis method on binary and multiclass sample imbalanced datasets. The analytical study of RFC incorporates with measurement of classification accuracy with performance metrics as True Positive (TP) Rate, False Positive (FP) Rate, Precision (Pre), Recall (Rec) F-Measure, Operating Characteristics of Receiver (ROC) Area, Matthews Correlation Coefficient (MCC), Probabilistic Relevance Classification (PRC) area with respect to numerous classes in refereed dataset. This paper focuses the reduction of the negative influence of imbalanced data with the use of Synthetic Minority Oversampling Technique (SMOTE). Experimental analysis carried out with the use of Knowledge Extraction Evolutionary Learning (KEEL) imbalanced data learning repository incorporating RFC classification with SMOTE technique. It also deals with RFC model construction with stage wise success rate calculation in training and testing partition and its impact on accuracy. The incorporates with error analysis report of incorrectly classified instances. Experimental results of the study indicate that imbalanced data have significant impact on classification accuracy and RFC outperforms with SMOTE.

[1]  Francisco Herrera,et al.  Evolutionary-based selection of generalized instances for imbalanced classification , 2012, Knowl. Based Syst..

[2]  Onisimo Mutanga,et al.  Random Forests Unsupervised Classification: The Detection and Mapping of Solanum mauritianum Infestations in Plantation Forestry Using Hyperspectral Data , 2015, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[3]  Minghui Zhang,et al.  Cost-sensitive ensemble classification algorithm for medical image , 2018, Int. J. Comput. Sci. Eng..

[4]  Peijun Du,et al.  Spectral–Spatial Classification for Hyperspectral Data Using Rotation Forests With Local Feature Extraction and Markov Random Fields , 2015, IEEE Transactions on Geoscience and Remote Sensing.

[5]  Yong Zhang,et al.  Imbalanced data classification based on scaling kernel-based support vector machine , 2014, Neural Computing and Applications.

[6]  Sattar Hashemi,et al.  To Combat Multi-Class Imbalanced Problems by Means of Over-Sampling Techniques , 2016, IEEE Transactions on Knowledge and Data Engineering.

[7]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[8]  Haibo He,et al.  Learning from Imbalanced Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[9]  Foster Provost,et al.  Machine Learning from Imbalanced Data Sets 101 , 2008 .

[10]  Jesús Alcalá-Fdez,et al.  KEEL Data-Mining Software Tool: Data Set Repository, Integration of Algorithms and Experimental Analysis Framework , 2011, J. Multiple Valued Log. Soft Comput..

[11]  Ying Mi,et al.  Imbalanced Classification Based on Active Learning SMOTE , 2013 .

[12]  Dimitrios I. Fotiadis,et al.  Automated Diagnosis of Diseases Based on Classification: Dynamic Determination of the Number of Trees in Random Forests Algorithm , 2012, IEEE Transactions on Information Technology in Biomedicine.

[13]  M. M. Hoffman,et al.  Classification and interaction in random forests , 2018, Proceedings of the National Academy of Sciences.