Boosting the oversampling methods based on differential evolution strategies for imbalanced learning

Abstract The class imbalance problem is a challenging problem in the data mining area. To overcome the low classification performance related to imbalanced datasets, sampling strategies are used for balancing the datasets. Oversampling is a technique that increases the minority class samples in various proportions. In this work, these 16 different DE strategies are used for oversampling the imbalanced datasets for better classification. The main aim of this work is to determine the best strategy in terms of Area Under the receiver operating characteristic (ROC) Curve (AUC) and Geometric Mean (G-Mean) metrics. 44 imbalanced datasets are used in experiments. Support Vector Machines (SVM), k-Nearest Neighbor (kNN), and Decision Tree (DT) are used as a classifier in the experiments. The best results are produced by 6th Debohid Strategy (DSt6), 1th Debohid Strategy (DSt1), and 3th Debohid Strategy (DSt3) by using kNN, DT, and SVM classifiers, respectively. The obtained results outperform the 9 state-of-the-art oversampling methods in terms of AUC and G-Mean metrics

[1]  Mehmet Akif Sahman Parameter Analysis of Differential Evolution Based Oversampling Approach for Highly Imbalanced Datasets , 2021, International Journal of Intelligent Systems and Applications in Engineering.

[2]  Addisson Salazar,et al.  A New Surrogating Algorithm by the Complex Graph Fourier Transform (CGFT) , 2019, Entropy.

[3]  Karol R. Opara,et al.  Comparison of mutation strategies in Differential Evolution - A probabilistic perspective , 2018, Swarm Evol. Comput..

[4]  Dervis Karaboga,et al.  AN IDEA BASED ON HONEY BEE SWARM FOR NUMERICAL OPTIMIZATION , 2005 .

[5]  Alfredo De Santis,et al.  Using generative adversarial networks for improving classification effectiveness in credit card fraud detection , 2017, Inf. Sci..

[6]  Pinar Civicioglu,et al.  Transforming geocentric cartesian coordinates to geodetic coordinates by using differential search algorithm , 2012, Comput. Geosci..

[7]  Haibo He,et al.  ADASYN: Adaptive synthetic sampling approach for imbalanced learning , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[8]  Mathew Mithra Noel,et al.  Galactic Swarm Optimization: A new global optimization metaheuristic inspired by galactic motion , 2016, Appl. Soft Comput..

[9]  Chumphol Bunkhumpornpat,et al.  Safe-Level-SMOTE: Safe-Level-Synthetic Minority Over-Sampling TEchnique for Handling the Class Imbalanced Problem , 2009, PAKDD.

[10]  Gustavo E. A. P. A. Batista,et al.  A study of the behavior of several methods for balancing machine learning training data , 2004, SKDD.

[11]  Gülay Tezel,et al.  Artificial algae algorithm (AAA) for nonlinear global optimization , 2015, Appl. Soft Comput..

[12]  Andrew K. C. Wong,et al.  Classification of Imbalanced Data: a Review , 2009, Int. J. Pattern Recognit. Artif. Intell..

[13]  Hossein Nezamabadi-pour,et al.  GSA: A Gravitational Search Algorithm , 2009, Inf. Sci..

[14]  Mustafa Servet Kiran,et al.  Integration search strategies in tree seed algorithm for high dimensional function optimization , 2020, Int. J. Mach. Learn. Cybern..

[15]  Dennis L. Wilson,et al.  Asymptotic Properties of Nearest Neighbor Rules Using Edited Data , 1972, IEEE Trans. Syst. Man Cybern..

[16]  Goldberg,et al.  Genetic algorithms , 1993, Robust Control Systems with Genetic Algorithms.

[17]  Nitesh V. Chawla,et al.  Editorial: special issue on learning from imbalanced data sets , 2004, SKDD.

[18]  Yan Xiao,et al.  COSTE: Complexity-based OverSampling TEchnique to alleviate the class imbalance problem in software defect prediction , 2021, Inf. Softw. Technol..

[19]  Ali Anaissi,et al.  ABC-sampling for Balancing Imbalanced Datasets Based on Artificial Bee Colony Algorithm , 2015, 2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA).

[20]  Jorma Laurikkala,et al.  Improving Identification of Difficult Small Classes by Balancing Class Distribution , 2001, AIME.

[21]  Oscar Castillo,et al.  Differential Evolution Algorithm , 2020 .

[22]  Mustafa Servet Kiran,et al.  A modification of tree-seed algorithm using Deb's rules for constrained optimization , 2018, Appl. Soft Comput..

[23]  Francisco Herrera,et al.  Evolutionary-based selection of generalized instances for imbalanced classification , 2012, Knowl. Based Syst..

[24]  Sankhadeep Chatterjee,et al.  Synthetic minority oversampling in addressing imbalanced sarcasm detection in social media , 2020, Multimedia Tools and Applications.

[25]  Dong-Ling Xu,et al.  An evidential reasoning rule based feature selection for improving trauma outcome prediction , 2021, Appl. Soft Comput..

[26]  K. V. Price,et al.  Differential evolution: a fast and simple numerical optimizer , 1996, Proceedings of North American Fuzzy Information Processing.

[27]  Albert Y. Zomaya,et al.  A particle swarm based hybrid system for imbalanced medical data sampling , 2009, BMC Genomics.

[28]  María José del Jesús,et al.  KEEL: a software tool to assess evolutionary algorithms for data mining problems , 2008, Soft Comput..

[29]  Riccardo Poli,et al.  Particle swarm optimization , 1995, Swarm Intelligence.

[30]  Peyman Adibi,et al.  Minority manifold regularization by stacked auto-encoder for imbalanced learning , 2020, Expert Syst. Appl..

[31]  Hui Han,et al.  Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning , 2005, ICIC.

[32]  Gary M. Weiss Mining with rarity: a unifying framework , 2004, SKDD.

[33]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[34]  Francisco Herrera,et al.  SMOTE-RSB*: a hybrid preprocessing approach based on oversampling and undersampling for high imbalanced data-sets using SMOTE and rough sets theory , 2012, Knowledge and Information Systems.

[35]  Mustafa Servet Kiran,et al.  TSA: Tree-seed algorithm for continuous optimization , 2015, Expert Syst. Appl..

[36]  Nitesh V. Chawla,et al.  SPECIAL ISSUE ON LEARNING FROM IMBALANCED DATA SETS , 2004 .

[37]  A. Kai Qin,et al.  Self-adaptive differential evolution algorithm for numerical optimization , 2005, 2005 IEEE Congress on Evolutionary Computation.

[38]  Sai-Ho Ling,et al.  A hybrid evolutionary preprocessing method for imbalanced datasets , 2018, Inf. Sci..

[39]  Husanbir Singh Pannu,et al.  A Systematic Review on Imbalanced Data Challenges in Machine Learning , 2019, ACM Comput. Surv..

[40]  Manoj Singh Gaur,et al.  A Systematic Survey on Cloud Forensics Challenges, Solutions, and Future Directions , 2019, ACM Comput. Surv..

[41]  Xu Li,et al.  Machinery fault diagnosis with imbalanced data using deep generative adversarial networks , 2020 .

[42]  M. Friedman A Comparison of Alternative Tests of Significance for the Problem of $m$ Rankings , 1940 .

[43]  Cheng Guangquan,et al.  Using Variational Auto Encoding in Credit Card Fraud Detection , 2020, IEEE Access.

[44]  Sedat Korkmaz,et al.  DEBOHID: A differential evolution based oversampling approach for highly imbalanced datasets , 2021, Expert Syst. Appl..

[45]  I. Tomek,et al.  Two Modifications of CNN , 1976 .

[46]  N. Japkowicz Learning from Imbalanced Data Sets: A Comparison of Various Strategies * , 2000 .

[47]  Zenghui Wang,et al.  Improved Differential Evolution Based on Mutation Strategies , 2018, ICSI.

[48]  Jing Zhao,et al.  ACOSampling: An ant colony optimization-based undersampling method for classifying imbalanced DNA microarray data , 2013, Neurocomputing.

[49]  Andrew Lewis,et al.  Grey Wolf Optimizer , 2014, Adv. Eng. Softw..

[50]  Larry J. Eshelman,et al.  The CHC Adaptive Search Algorithm: How to Have Safe Search When Engaging in Nontraditional Genetic Recombination , 1990, FOGA.

[51]  Ginny Y. Wong,et al.  A novel evolutionary preprocessing method based on over-sampling and under-sampling for imbalanced datasets , 2013, IECON 2013 - 39th Annual Conference of the IEEE Industrial Electronics Society.

[52]  Dimitri Van De Ville,et al.  A Spectral Method for Generating Surrogate Graph Signals , 2016, IEEE Signal Processing Letters.

[53]  Mahmoud M. Fahmy,et al.  An Enhanced Differential Evolution Algorithm with Multi-mutation Strategies and Self-adapting Control Parameters , 2019, International Journal of Intelligent Systems and Applications.

[54]  Rainer Storn,et al.  Differential Evolution – A Simple and Efficient Heuristic for global Optimization over Continuous Spaces , 1997, J. Glob. Optim..

[55]  Jesús Alcalá-Fdez,et al.  KEEL Data-Mining Software Tool: Data Set Repository, Integration of Algorithms and Experimental Analysis Framework , 2011, J. Multiple Valued Log. Soft Comput..

[56]  Xiaomei Li,et al.  A Novel Ensemble Learning Paradigm for Medical Diagnosis With Imbalanced Data , 2020, IEEE Access.