Correlation-based Oversampling aided Cost Sensitive Ensemble learning technique for Treatment of Class Imbalance

ABSTRACT The issue of class imbalance and its consequences over the conventional learning models is a well-investigated topic, as it highly influences performances of real-life classification tasks. Amongst the available solutions, Synthetic Minority Oversampling Technique (SMOTE) imprints efficacy in balancing the data through synthetic minority instance generation. However, SMOTE suffers from the drawback of redundant data generation owing to uniform oversampling rate in regard to which, SMOTE with a customised oversampling rate has been investigated recently. In parallel to this, ensemble learning approaches are quite effective in improving prediction abilities of a set of weak classifiers through adaptive-weighted training. However, it does not account the imbalanced nature of the data during training. Through this paper, Correlation-based Oversampling aided Cost Sensitive Ensemble learning (CorrOV-CSEn) is proposed by integrating correlation-based oversampling with the AdaBoost ensemble learning model. The correlation-based oversampling entails to define a customised oversampling rate and a suitable oversampling zone while a misclassification ratio-based cost-function is introduced in the AdaBoost model to administer adaptive learning of imbalanced cases. CorrOV-CSEn is evaluated against 13 state-of-the-art methods by using 8 simulation datasets. The experimental results establish CorrOV-CSEn to be effective than the state-of-the-art methods in resolving the concerned issues.

[1]  Seifedine Kadry,et al.  A Boosting-Aided Adaptive Cluster-Based Undersampling Approach for Treatment of Class Imbalance Problem , 2020, Int. J. Data Warehous. Min..

[2]  Lars Schmidt-Thieme,et al.  Cost-sensitive learning methods for imbalanced data , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[3]  Hari Wijayanto,et al.  SMOTE Bagging Algorithm for Imbalanced Dataset in Logistic Regression Analysis (Case: Credit of Bank X) , 2015 .

[4]  Josef Kittler,et al.  Inverse random under sampling for class imbalance problem and its application to multi-label classification , 2012, Pattern Recognit..

[5]  Ch. Satyananda Reddy,et al.  An Efficient Software Defect Analysis Using Correlation-Based Oversampling , 2018 .

[6]  Damminda Alahakoon,et al.  Minority report in fraud detection: classification of skewed data , 2004, SKDD.

[7]  Gongping Yang,et al.  On the Class Imbalance Problem , 2008, 2008 Fourth International Conference on Natural Computation.

[8]  Charles Elkan,et al.  The Foundations of Cost-Sensitive Learning , 2001, IJCAI.

[9]  Chun-Xia Zhang,et al.  An Empirical Study on the Performance of Cost-Sensitive Boosting Algorithms with Different Levels of Class Imbalance , 2013 .

[10]  Hui Han,et al.  Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning , 2005, ICIC.

[11]  Liangxiao Jiang,et al.  Learning decision tree for ranking , 2009, Knowledge and Information Systems.

[12]  Salvatore J. Stolfo,et al.  AdaCost: Misclassification Cost-Sensitive Boosting , 1999, ICML.

[13]  Francisco Herrera,et al.  An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics , 2013, Inf. Sci..

[14]  Gustavo E. A. P. A. Batista,et al.  Class Imbalances versus Class Overlapping: An Analysis of a Learning System Behavior , 2004, MICAI.

[15]  Eric Granger,et al.  Progressive Boosting for Class Imbalance , 2017, ArXiv.

[16]  Dong-Sheng Cao,et al.  The boosting: A new idea of building models , 2010 .

[17]  I. Tomek,et al.  Two Modifications of CNN , 1976 .

[18]  Liangxiao Jiang,et al.  A differential evolution-based method for class-imbalanced cost-sensitive learning , 2015, 2015 International Joint Conference on Neural Networks (IJCNN).

[19]  Nathalie Japkowicz,et al.  The class imbalance problem: A systematic study , 2002, Intell. Data Anal..

[20]  Taghi M. Khoshgoftaar,et al.  RUSBoost: A Hybrid Approach to Alleviating Class Imbalance , 2010, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[21]  Yue-Shi Lee,et al.  Cluster-based under-sampling approaches for imbalanced data distributions , 2009, Expert Syst. Appl..

[22]  Der-Chiang Li,et al.  A learning method for the class imbalance problem with medical data sets , 2010, Comput. Biol. Medicine.

[23]  Shasha Wang,et al.  Cost-sensitive Bayesian network classifiers , 2014, Pattern Recognit. Lett..

[24]  Dazhe Zhao,et al.  Ensemble based adaptive over-sampling method for imbalanced data learning in computer aided detection of microaneurysm , 2017, Comput. Medical Imaging Graph..

[25]  ZhouZhi-Hua,et al.  Training Cost-Sensitive Neural Networks with Methods Addressing the Class Imbalance Problem , 2006 .

[26]  Francisco Herrera,et al.  SMOTE-IPF: Addressing the noisy and borderline examples problem in imbalanced classification by a re-sampling method with filtering , 2015, Inf. Sci..

[27]  Saroj K. Biswas,et al.  A Boosting based Adaptive Oversampling Technique for Treatment of Class Imbalance , 2019, 2019 International Conference on Computer Communication and Informatics (ICCCI).

[28]  Xiqing Cui,et al.  Imbalanced classification of mental workload using a cost-sensitive majority weighted minority oversampling strategy , 2017, Cognition, Technology & Work.

[29]  Nitesh V. Chawla,et al.  SMOTEBoost: Improving Prediction of the Minority Class in Boosting , 2003, PKDD.

[30]  Antônio de Pádua Braga,et al.  Novel Cost-Sensitive Approach to Improve the Multilayer Perceptron Performance on Imbalanced Data , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[31]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[32]  Chumphol Bunkhumpornpat,et al.  Safe-Level-SMOTE: Safe-Level-Synthetic Minority Over-Sampling TEchnique for Handling the Class Imbalanced Problem , 2009, PAKDD.

[33]  Xin Yao,et al.  MWMOTE--Majority Weighted Minority Oversampling Technique for Imbalanced Data Set Learning , 2014 .

[34]  Houde Dai,et al.  Classification of Parkinsonian Rigidity Using AdaBoost with Decision Stumps , 2018, 2018 IEEE International Conference on Robotics and Biomimetics (ROBIO).

[35]  Sungzoon Cho,et al.  EUS SVMs: Ensemble of Under-Sampled SVMs for Data Imbalance Problems , 2006, ICONIP.

[36]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[37]  Geoffrey I. Webb,et al.  MultiBoosting: A Technique for Combining Boosting and Wagging , 2000, Machine Learning.

[38]  Dazhe Zhao,et al.  A novel cost sensitive neural network ensemble for multiclass imbalance data learning , 2013, The 2013 International Joint Conference on Neural Networks (IJCNN).

[39]  Guy Lapalme,et al.  A systematic analysis of performance measures for classification tasks , 2009, Inf. Process. Manag..

[40]  Loris Nanni,et al.  Coupling different methods for overcoming the class imbalance problem , 2015, Neurocomputing.

[41]  Chen Qiu,et al.  A Novel Minority Cloning Technique for Cost-Sensitive Learning , 2015, Int. J. Pattern Recognit. Artif. Intell..

[42]  T. Jayanthi,et al.  Weighted-SMOTE: A modification to SMOTE for event classification in sodium cooled fast reactors , 2017 .

[43]  Yang Wang,et al.  Cost-sensitive boosting for classification of imbalanced data , 2007, Pattern Recognit..

[44]  A. Rosenfeld,et al.  IEEE TRANSACTIONS ON SYSTEMS , MAN , AND CYBERNETICS , 2022 .

[45]  Sang-Hoon Oh,et al.  Error back-propagation algorithm for classification of imbalanced data , 2011, Neurocomputing.

[46]  Zhi-Hua Zhou,et al.  Ieee Transactions on Knowledge and Data Engineering 1 Training Cost-sensitive Neural Networks with Methods Addressing the Class Imbalance Problem , 2022 .

[47]  Lei Wang,et al.  A study of AdaBoost with SVM based weak learners , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..

[48]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[49]  Debashree Devi,et al.  Learning in presence of class imbalance and class overlapping by using one-class SVM and undersampling technique , 2019, Connect. Sci..

[50]  Xin Yao,et al.  Diversity analysis on imbalanced data sets by using ensemble models , 2009, 2009 IEEE Symposium on Computational Intelligence and Data Mining.

[51]  Early Detection of Parkinson's Disease , 2021, Research Anthology on Diagnosing and Treating Neurocognitive Disorders.

[52]  Diane J. Cook,et al.  Handling Class Overlap and Imbalance to Detect Prompt Situations in Smart Homes , 2013, 2013 IEEE 13th International Conference on Data Mining Workshops.

[53]  Rinkle Rani,et al.  Local graph based correlation clustering , 2017, Knowl. Based Syst..