Dynamic Synthetic Minority Over-Sampling Technique-Based Rotation Forest for the Classification of Imbalanced Hyperspectral Data

Rotation forest (RoF) is a powerful ensemble classifier and has attracted substantial attention due to its performance in hyperspectral data classification. Multi-class imbalance learning is one of the biggest challenges in machine learning and remote sensing. The standard technique for constructing RoF ensemble tends to increase the overall accuracy; RoF has difficulty to sufficiently recognize the minority class. This paper proposes a novel dynamic SMOTE (synthetic minority oversampling technique)-based RoF algorithm for the multi-class imbalance problem. The main idea of the proposed method is to dynamically balance the class distribution before building each rotation decision tree. A resampling rate is set in each iteration (ranging from 10% in the first iteration to 100% in the last) and this ratio defines the number of minority class instances randomly resampled (with replacement) from the original dataset in each iteration. The rest of the minority class instances are generated by the SMOTE method. The reported results on three real hyperspectral datasets show that the proposed method can get better performance than random forest, RoF, and some popular data sampling methods.

[1]  Fang Liu,et al.  Imbalanced Hyperspectral Image Classification Based on Maximum Margin , 2015, IEEE Geoscience and Remote Sensing Letters.

[2]  Chongsheng Zhang,et al.  An empirical comparison on state-of-the-art multi-class imbalance learning algorithms and a new diversified ensemble learning scheme , 2018, Knowl. Based Syst..

[3]  Rosa Maria Valdovinos,et al.  New Applications of Ensembles of Classifiers , 2003, Pattern Analysis & Applications.

[4]  Haibo He,et al.  Learning from Imbalanced Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[6]  Juan José Rodríguez Diez,et al.  Rotation Forest: A New Classifier Ensemble Method , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Bartosz Krawczyk,et al.  Analyzing the oversampling of different classes and types of examples in multi-class imbalanced datasets , 2016, Pattern Recognit..

[8]  Francisco Herrera,et al.  EUSBoost: Enhancing ensembles for highly imbalanced data-sets by evolutionary undersampling , 2013, Pattern Recognit..

[9]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[10]  Naoto Yokoya,et al.  Random Forest Ensembles and Extended Multiextinction Profiles for Hyperspectral Image Classification , 2018, IEEE Transactions on Geoscience and Remote Sensing.

[11]  Nitesh V. Chawla,et al.  SMOTEBoost: Improving Prediction of the Minority Class in Boosting , 2003, PKDD.

[12]  George D. C. Cavalcanti,et al.  A study on combining dynamic selection and data preprocessing for imbalance learning , 2018, Neurocomputing.

[13]  David A. Clausi,et al.  Hyperspectral Image Classification With Limited Labeled Training Samples Using Enhanced Ensemble Learning and Conditional Random Fields , 2015, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[14]  Chao Chen,et al.  Using Random Forest to Learn Imbalanced Data , 2004 .

[15]  Bartosz Krawczyk,et al.  Learning from imbalanced data: open challenges and future directions , 2016, Progress in Artificial Intelligence.

[16]  Huijuan Lu,et al.  A cost-sensitive rotation forest algorithm for gene expression data classification , 2017, Neurocomputing.

[17]  Paul M. Mather,et al.  Classification methods for remotely sensed data, 2nd ed , 2016 .

[18]  Fernando Bação,et al.  Oversampling for Imbalanced Learning Based on K-Means and SMOTE , 2017, Inf. Sci..

[19]  Zhiping Lin,et al.  Kernel based online learning for imbalance multiclass classification , 2018, Neurocomputing.

[20]  Wei Feng,et al.  Weight-Based Rotation Forest for Hyperspectral Image Classification , 2017, IEEE Geoscience and Remote Sensing Letters.

[21]  Taghi M. Khoshgoftaar,et al.  RUSBoost: A Hybrid Approach to Alleviating Class Imbalance , 2010, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[22]  Peijun Du,et al.  Random Forest and Rotation Forest for fully polarized SAR image classification using polarimetric and spatial features , 2015 .

[23]  C. Lee Giles,et al.  Learning on the border: active learning in imbalanced data classification , 2007, CIKM '07.

[24]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[25]  Xin Yao,et al.  Diversity analysis on imbalanced data sets by using ensemble models , 2009, 2009 IEEE Symposium on Computational Intelligence and Data Mining.

[26]  Jerzy Stefanowski,et al.  Neighbourhood sampling in bagging for imbalanced data , 2015, Neurocomputing.

[27]  Ron Kohavi,et al.  Bias Plus Variance Decomposition for Zero-One Loss Functions , 1996, ICML.

[28]  Hamido Fujita,et al.  Imbalanced enterprise credit evaluation with DTE-SBD: Decision tree ensemble based on SMOTE and bagging with differentiated sampling rates , 2018, Inf. Sci..

[29]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[30]  Ian T. Jolliffe,et al.  Principal Component Analysis , 2002, International Encyclopedia of Statistical Science.

[31]  Juan José Rodríguez Diez,et al.  Random Balance: Ensembles of variable priors classifiers for imbalanced data , 2015, Knowl. Based Syst..

[32]  Jon Atli Benediktsson,et al.  Hyperspectral Image Classification With Rotation Random Forest Via KPCA , 2017, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[33]  Gustavo E. A. P. A. Batista,et al.  A study of the behavior of several methods for balancing machine learning training data , 2004, SKDD.

[34]  Francisco Herrera,et al.  A Review on Ensembles for the Class Imbalance Problem: Bagging-, Boosting-, and Hybrid-Based Approaches , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[35]  I. Jolliffe Principal Component Analysis , 2002 .

[36]  Wei Feng,et al.  Class imbalance ensemble learning based on the margin theory , 2018 .

[37]  Jon Atli Benediktsson,et al.  Class-Separation-Based Rotation Forest for Hyperspectral Image Classification , 2016, IEEE Geoscience and Remote Sensing Letters.

[38]  Yang Wang,et al.  Cost-sensitive boosting for classification of imbalanced data , 2007, Pattern Recognit..

[39]  Samia Boukir,et al.  Class noise removal and correction for image classification using ensemble margin , 2015, 2015 IEEE International Conference on Image Processing (ICIP).

[40]  Francisco Herrera,et al.  Dynamic ensemble selection for multi-class imbalanced datasets , 2018, Inf. Sci..