Review of random forest classification techniques to resolve data imbalance

In this current age, numerous ranges of real word applications with imbalanced dataset is one of the foremost focal point of researcher's inattention. There is the enormous increment of data generation and imbalance within dataset. Processing and knowledge extraction of huge amount of imbalanced data becomes a challenge related with space and time necessities. Generally there is a list of an assortment of factual humanity applications which deals with unequal data sample division in to number of classes. Due to this division of data either of class goes into majority or minority with comparably less data count. This outnumbering of data sample in either of one class directs towards the handling of minority class and target on remarkable reduction in error rate. The standard learning methods do not directly focus on this type of classes. Random Forest Classification (RFC) is an ensemble approach that utilizes a number of classifiers to work together in order to identify the class label for unlabeled instances. This approach has proved its high accuracy and superiority with imbalanced datasets. This classifier provides various techniques to resolve class imbalance problem. This paper summarizes, the literature survey from 2000 to 2016 of various techniques related to RFC to resolve class imbalance. Specifically Weighted Random Forest (WRF), Balanced Random Forest (BRF), Sampling (Under Sampling (US)) and Down Sampling (DS), Cost Sensitive Methods have been adapted more to till date. The limitation of this numerous literature is researchers can focus on dynamic integration techniques to resolve class imbalance and increase robustness and versatility of classification.

[1]  Vrushali Kulkarni,et al.  Effective Learning and Classification using Random Forest Algorithm , 2014 .

[2]  P. Manikandan,et al.  IMBALANCED DATASET CLASSIFICATION AND SOLUTIONS : A REVIEW , 2014 .

[3]  Ying Mi,et al.  Imbalanced Classification Based on Active Learning SMOTE , 2013 .

[4]  Francisco Herrera,et al.  Ordering-based pruning for improving the performance of ensembles of classifiers in the framework of imbalanced datasets , 2016, Inf. Sci..

[5]  Foster Provost,et al.  Machine Learning from Imbalanced Data Sets 101 , 2008 .

[6]  Huy Phan,et al.  Random Regression Forests for Acoustic Event Detection and Classification , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[7]  Jing Yang,et al.  An Improved Random Forest Algorithm for Class-Imbalanced Data Classification and its Application in PAD Risk Factors Analysis , 2013 .

[8]  Gustavo E. A. P. A. Batista,et al.  Class Imbalances versus Class Overlapping: An Analysis of a Learning System Behavior , 2004, MICAI.

[9]  Thanh-Nghi Do,et al.  Classifying many-class high-dimensional fingerprint datasets using random forest of oblique decision trees , 2015, Vietnam Journal of Computer Science.

[10]  Peter Tiño,et al.  Managing Diversity in Regression Ensembles , 2005, J. Mach. Learn. Res..

[11]  Bernard De Baets,et al.  Impact of Reducing Polarimetric SAR Input on the Uncertainty of Crop Classifications Based on the Random Forests Algorithm , 2012, IEEE Transactions on Geoscience and Remote Sensing.

[12]  Peter Brennan,et al.  A comprehensive survey of methods for overcoming the class imbalance problem in fraud detection , 2012 .

[13]  I. Maqsood,et al.  Random Forests and Decision Trees , 2012 .

[14]  Jie Gu,et al.  Making Class Bias Useful: A Strategy of Learning from Imbalanced Data , 2007, IDEAL.

[15]  Mohammad Khalilia,et al.  Predicting disease risks from highly imbalanced data using random forest , 2011, BMC Medical Informatics Decis. Mak..

[16]  Sattar Hashemi,et al.  To Combat Multi-Class Imbalanced Problems by Means of Over-Sampling Techniques , 2016, IEEE Transactions on Knowledge and Data Engineering.

[17]  Rosa Maria Valdovinos,et al.  The Imbalanced Training Sample Problem: Under or over Sampling? , 2004, SSPR/SPR.

[18]  M. Phil A Study On Classification Of Imbalanced Data Set , 2014 .

[19]  Syed Salman Ali,et al.  AN OVERVIEW ON DATA MINING DESIGNED FOR IMBALANCED DATASETS , 2014 .

[20]  Bartosz Krawczyk,et al.  Learning from imbalanced data: open challenges and future directions , 2016, Progress in Artificial Intelligence.

[21]  Peijun Du,et al.  Spectral–Spatial Classification for Hyperspectral Data Using Rotation Forests With Local Feature Extraction and Markov Random Fields , 2015, IEEE Transactions on Geoscience and Remote Sensing.

[22]  Mykola Pechenizkiy,et al.  Dynamic Integration with Random Forests , 2006, ECML.

[23]  N. A. Khovanova,et al.  Decision tree and random forest models for outcome prediction in antibody incompatible kidney transplantation , 2017, Biomed. Signal Process. Control..

[24]  R. A. Mollineda,et al.  The class imbalance problem in pattern classification and learning , 2009 .

[25]  Dr. S. Palaniswami,et al.  EFFICIENT METHODS TO SOLVE CLASS IMBALANCE AND CLASS OVERLAP , 2014 .

[26]  Seetha Hari,et al.  Learning From Imbalanced Data , 2019, Advances in Computer and Electrical Engineering.

[27]  Francisco Herrera,et al.  On the use of MapReduce for imbalanced big data using Random Forest , 2014, Inf. Sci..

[29]  Ram Narayan Yadav,et al.  Regularized Weighted Circular Complex-Valued Extreme Learning Machine for Imbalanced Learning , 2015, IEEE Access.

[30]  Onisimo Mutanga,et al.  Random Forests Unsupervised Classification: The Detection and Mapping of Solanum mauritianum Infestations in Plantation Forestry Using Hyperspectral Data , 2015, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.