Highly imbalanced classification using improved rotation forests

Imbalanced data classification is a challenging problem in data mining. It happens in many real-world applications and has attracted growing attentions from researchers. This issue occurs when the number of one class is much higher than the other class. Ensemble of classifiers has been well known as an effective solution. Then, two novel ensemble algorithms RUROForest and SROForest based on rotation forests are proposed for solving highly imbalanced problems. Random under-sampling or SMOTE approaches are combined with rotation forest in the proposed algorithms, which balance the uneven distribution of data sets and keep the diversity of single classifier as well. Focused on two-class highly imbalanced problems, 22 relevant data sets are performed in experiments. Experimental results and statistical analyses show that our proposed methods overtake the state-of-the-art ensemble methods on the most widely used imbalanced measure criterion AUC.

[1]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[2]  Zhi-Hua Zhou,et al.  Exploratory Under-Sampling for Class-Imbalance Learning , 2006, Sixth International Conference on Data Mining (ICDM'06).

[3]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[4]  Lior Rokach,et al.  Ensemble-based classifiers , 2010, Artificial Intelligence Review.

[5]  Jesus A. Gonzalez,et al.  Symbolic One-Class Learning from Imbalanced Datasets: Application in Medical Diagnosis , 2009, Int. J. Artif. Intell. Tools.

[6]  Charles X. Ling,et al.  Using AUC and accuracy in evaluating learning algorithms , 2005, IEEE Transactions on Knowledge and Data Engineering.

[7]  Taghi M. Khoshgoftaar,et al.  RUSBoost: A Hybrid Approach to Alleviating Class Imbalance , 2010, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[8]  Xi Chen,et al.  Graph-Based Feature Selection for Object-Oriented Classification in VHR Airborne Imagery , 2011, IEEE Transactions on Geoscience and Remote Sensing.

[9]  Koen W. De Bock,et al.  An empirical evaluation of rotation-based ensemble classifiers for customer churn prediction , 2011, Expert Syst. Appl..

[10]  Akin Ozcift,et al.  SVM Feature Selection Based Rotation Forest Ensemble Classifiers to Improve Computer-Aided Diagnosis of Parkinson Disease , 2012, Journal of medical systems.

[11]  Francisco Herrera,et al.  EUSBoost: Enhancing ensembles for highly imbalanced data-sets by evolutionary undersampling , 2013, Pattern Recognit..

[12]  Nojun Kwak,et al.  Feature extraction for classification problems and its application to face recognition , 2008, Pattern Recognit..

[13]  Jesús Alcalá-Fdez,et al.  KEEL Data-Mining Software Tool: Data Set Repository, Integration of Algorithms and Experimental Analysis Framework , 2011, J. Multiple Valued Log. Soft Comput..

[14]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[15]  Edward Y. Chang,et al.  KBA: kernel boundary alignment considering imbalanced data distribution , 2005, IEEE Transactions on Knowledge and Data Engineering.

[16]  Nathalie Japkowicz,et al.  The class imbalance problem: A systematic study , 2002, Intell. Data Anal..

[17]  Haibo He,et al.  Learning from Imbalanced Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[18]  Nitesh V. Chawla,et al.  SMOTEBoost: Improving Prediction of the Minority Class in Boosting , 2003, PKDD.

[19]  José Salvador Sánchez,et al.  On the k-NN performance in a challenging scenario of imbalance and overlapping , 2008, Pattern Analysis and Applications.

[20]  Juan José Rodríguez Diez,et al.  Rotation Forest: A New Classifier Ensemble Method , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Md. Monirul Islam,et al.  A review on automatic image annotation techniques , 2012, Pattern Recognit..

[22]  Xindong Wu,et al.  10 Challenging Problems in Data Mining Research , 2006, Int. J. Inf. Technol. Decis. Mak..

[23]  Francisco Herrera,et al.  A Review on Ensembles for the Class Imbalance Problem: Bagging-, Boosting-, and Hybrid-Based Approaches , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[24]  David A. Cieslak,et al.  Automatically countering imbalance and its empirical relationship to cost , 2008, Data Mining and Knowledge Discovery.

[25]  Xiuqin Pan,et al.  An improved differential evolution for parameter optimisation , 2015, Int. J. Wirel. Mob. Comput..

[26]  Xin Yao,et al.  Diversity analysis on imbalanced data sets by using ensemble models , 2009, 2009 IEEE Symposium on Computational Intelligence and Data Mining.

[27]  Yanqing Zhang,et al.  SVMs Modeling for Highly Imbalanced Classification , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[28]  David A. Cieslak,et al.  Start Globally, Optimize Locally, Predict Globally: Improving Performance on Imbalanced Data , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[29]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[30]  Zuoyong Li,et al.  Exploit more information of the sample for representation based face recognition , 2015, Int. J. Wirel. Mob. Comput..

[31]  Nitesh V. Chawla,et al.  Editorial: special issue on learning from imbalanced data sets , 2004, SKDD.

[32]  Huaxiang Zhang,et al.  RWO-Sampling: A random walk over-sampling approach to imbalanced data classification , 2014, Inf. Fusion.