An Improved Oversampling Algorithms based on Informative Sample Selection Strategy Solving Imbalance

Imbalanced data has been the focus of ongoing classification research. It describes a scenario where the distribution of data samples is uneven, and one or more classes in the dataset are underrepresented as a result. When trained on such datasets, this mismatch has a negative impact on the performance of conventional learning models. The key problem is in finding appropriate samples for creating synthetic data, even though numerous strategies have been developed to overcome class imbalance during data pre-processing. In this study, we offer an efficient method for overcoming imbalance classification issues caused by oversampling called Informative Sample Selection (ISS). The main goal of ISS is to find useful samples from the minority class in the dataset that may be used to produce data that is synthetic. We conducted experiments on 22 imbalanced datasets to evaluate the performance of our suggested model. We assessed the performance of ISS in comparison to several cutting-edge techniques, including SMOTE, Borderline-SMOTE, ADASYN, safe-level SMOTE, and ROS. AUC and F-Measure were the evaluation measures employed in our study. The outcomes of our tests show that ISS works better than the current approaches, showing significant progress in tackling the challenges brought on by imbalanced data in classification.

[1]  José Salvador Sánchez,et al.  DBIG-US: A two-stage under-sampling algorithm to face the class imbalance problem , 2020, Expert Syst. Appl..

[2]  Jun Zhang,et al.  A Study of Data Pre-processing Techniques for Imbalanced Biomedical Data Classification , 2019, Int. J. Bioinform. Res. Appl..

[3]  Olaf Wolkenhauer,et al.  LoRAS: an oversampling approach for imbalanced datasets , 2019, Machine Learning.

[4]  D. Venkata Ratnam,et al.  Modelling of FPGA-Particle Swarm Optimized GNSS Receiver for Satellite Applications , 2019, Wirel. Pers. Commun..

[5]  Venkatesan Subramanian,et al.  Reinforced cuckoo search algorithm-based multimodal optimization , 2019, Applied Intelligence.

[6]  Eyad Elyan,et al.  Overlap-Based Undersampling for Improving Imbalanced Data Classification , 2018, IDEAL.

[7]  K. Harinadha Reddy,et al.  PSO Algorithm Support Switching Pulse Sequence ISVM for Six-Phase Matrix Converter-Fed Drives , 2018, Smart Intelligent Computing and Applications.

[8]  Kavita Arjun Sultanpure,et al.  Job Scheduling for Energy Efficiency Using Artificial Bee Colony through Virtualization , 2018, International Journal of Intelligent Engineering and Systems.

[9]  V. Chandra Prakash,et al.  Evolutionary Hybrid Particle Swarm Optimization Algorithm for Solving NP-Hard No-Wait Flow Shop Scheduling Problems , 2017, Algorithms.

[10]  Lior Rokach,et al.  Fast-CBUS: A fast clustering-based undersampling method for addressing the class imbalance problem , 2017, Neurocomputing.

[11]  Chumphol Bunkhumpornpat,et al.  DBMUTE: density-based majority under-sampling technique , 2017, Knowledge and Information Systems.

[12]  Iman Nekooeimehr,et al.  Adaptive semi-unsupervised weighted oversampling (A-SUWO) for imbalanced datasets , 2016, Expert Syst. Appl..

[13]  Bo Tang,et al.  KernelADASYN: Kernel based adaptive synthetic data generation for imbalanced learning , 2015, 2015 IEEE Congress on Evolutionary Computation (CEC).

[14]  M. N. Sulaiman,et al.  A Review On Evaluation Metrics For Data Classification Evaluations , 2015 .

[15]  Dongjun Chung,et al.  Accurate ensemble pruning with PL-bagging , 2015, Comput. Stat. Data Anal..

[16]  Francisco Herrera,et al.  SMOTE-IPF: Addressing the noisy and borderline examples problem in imbalanced classification by a re-sampling method with filtering , 2015, Inf. Sci..

[17]  K. Murase,et al.  MWMOTE--Majority Weighted Minority Oversampling Technique for Imbalanced Data Set Learning , 2014, IEEE Transactions on Knowledge and Data Engineering.

[18]  Xin Yao,et al.  Using Class Imbalance Learning for Software Defect Prediction , 2013, IEEE Transactions on Reliability.

[19]  C. Lursinsap,et al.  DBSMOTE: Density-Based Synthetic Minority Over-sampling TEchnique , 2012, Applied Intelligence.

[20]  Chumphol Bunkhumpornpat,et al.  Safe-Level-SMOTE: Safe-Level-Synthetic Minority Over-Sampling TEchnique for Handling the Class Imbalanced Problem , 2009, PAKDD.

[21]  Haibo He,et al.  ADASYN: Adaptive synthetic sampling approach for imbalanced learning , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[22]  Hui Han,et al.  Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning , 2005, ICIC.

[23]  Andrew P. Bradley,et al.  The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..

[24]  Sai Prasad Potharaju,et al.  A Novel LtR and RtL Framework for Subset Feature Selection (Reduction) for Improving the Classification Accuracy , 2018, Advances in Intelligent Systems and Computing.

[25]  Frank Diederich,et al.  Linear Models For Unbalanced Data , 2016 .

[26]  Mohamed Bekkar,et al.  Evaluation Measures for Models Assessment over Imbalanced Data Sets , 2013 .

[27]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..