A New Diversity Technique for Imbalance Learning Ensembles

Data mining and machine learning techniques designed to solve classification problems require balanced class distribution. However, in reality sometimes the classification of datasets indicates the existence of a class represented by a large number of instances whereas there are classes with far fewer instances. This problem is known as the class imbalance problem. Classifier Ensembles is a method often used in overcoming class imbalance problems. Data Diversity is one of the cornerstones of ensembles. An ideal ensemble system should have accurrate individual classifiers and if there is an error it is expected to occur on different objects or instances. This research will present the results of overview and experimental study using Hybrid Approach Redefinition (HAR) Method in handling class imbalance and at the same time expected to get better data diversity. This research will be conducted using 6 datasets with different imbalanced ratios and will be compared with SMOTEBoost which is one of the Re-Weighting method which is often used in handling class imbalance. This study shows that the data diversity is related to performance in the imbalance learning ensembles and the proposed methods can obtain better data diversity.

[1]  Opim Salim Sitompul,et al.  Optimization Model of K-Means Clustering Using Artificial Neural Networks to Handle Class Imbalance Problem , 2018 .

[2]  Francisco Herrera,et al.  A Review on Ensembles for the Class Imbalance Problem: Bagging-, Boosting-, and Hybrid-Based Approaches , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[3]  Yang Wang,et al.  Cost-sensitive boosting for classification of imbalanced data , 2007, Pattern Recognit..

[4]  G. Yule,et al.  On the association of attributes in statistics, with examples from the material of the childhood society, &c , 1900, Proceedings of the Royal Society of London.

[5]  George D. C. Cavalcanti,et al.  A study on combining dynamic selection and data preprocessing for imbalance learning , 2018, Neurocomputing.

[6]  Jian Gao,et al.  A new sampling method for classifying imbalanced data based on support vector machine ensemble , 2016, Neurocomputing.

[7]  Nitesh V. Chawla,et al.  SPECIAL ISSUE ON LEARNING FROM IMBALANCED DATA SETS , 2004 .

[8]  Xin Yao,et al.  Diversity analysis on imbalanced data sets by using ensemble models , 2009, 2009 IEEE Symposium on Computational Intelligence and Data Mining.

[9]  Krishna M. Sivalingam,et al.  Learning from class-imbalanced data in wireless sensor networks , 2003, 2003 IEEE 58th Vehicular Technology Conference. VTC 2003-Fall (IEEE Cat. No.03CH37484).

[10]  Juan José Rodríguez Diez,et al.  Random Balance: Ensembles of variable priors classifiers for imbalanced data , 2015, Knowl. Based Syst..

[11]  Juan José Rodríguez Diez,et al.  Diversity techniques improve the performance of the best imbalance learning ensembles , 2015, Inf. Sci..

[12]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[13]  Subhash C. Bagui,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2005, Technometrics.