Improved Overlap-based Undersampling for Imbalanced Dataset Classification with Application to Epilepsy and Parkinson's Disease

Classification of imbalanced datasets has attracted substantial research interest over the past decades. Imbalanced datasets are common in several domains such as health, finance, security and others. A wide range of solutions to handle imbalanced datasets focus mainly on the class distribution problem and aim at providing more balanced datasets by means of resampling. However, existing literature shows that class overlap has a higher negative impact on the learning process than class distribution. In this paper, we propose overlap-based undersampling methods for maximizing the visibility of the minority class instances in the overlapping region. This is achieved by the use of soft clustering and the elimination threshold that is adaptable to the overlap degree to identify and eliminate negative instances in the overlapping region. For more accurate clustering and detection of overlapped negative instances, the presence of the minority class at the borderline areas is emphasized by means of oversampling. Extensive experiments using simulated and real-world datasets covering a wide range of imbalance and overlap scenarios including extreme cases were carried out. Results show significant improvement in sensitivity and competitive performance with well-established and state-of-the-art methods.

[1]  Jian Zhuang,et al.  Novel soft subspace clustering with multi-objective evolutionary approach for high-dimensional data , 2013, Pattern Recognit..

[2]  Yuan Zhang,et al.  Fuzzy clustering with the entropy of attribute weights , 2016, Neurocomputing.

[3]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[4]  Jirí Mekyska,et al.  Vowel Articulation Dynamic Stability Related to Parkinson's Disease Rating Features: Male Dataset , 2019, Int. J. Neural Syst..

[5]  Eyad Elyan,et al.  Overlap-Based Undersampling for Improving Imbalanced Data Classification , 2018, IDEAL.

[6]  H. Adeli,et al.  Automated seizure prediction , 2018, Epilepsy & Behavior.

[7]  Christopher Leckie,et al.  High-dimensional and large-scale anomaly detection using a linear one-class SVM with deep learning , 2016, Pattern Recognit..

[8]  Shengrui Wang,et al.  Measuring the component overlapping in the Gaussian mixture model , 2011, Data Mining and Knowledge Discovery.

[9]  Francisco Herrera,et al.  A Review on Ensembles for the Class Imbalance Problem: Bagging-, Boosting-, and Hybrid-Based Approaches , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[10]  Eyad Elyan,et al.  Neighbourhood-based undersampling approach for handling imbalanced and overlapped data , 2020, Inf. Sci..

[11]  Chumphol Bunkhumpornpat,et al.  Safe-Level-SMOTE: Safe-Level-Synthetic Minority Over-Sampling TEchnique for Handling the Class Imbalanced Problem , 2009, PAKDD.

[12]  Xin Yao,et al.  Diversity analysis on imbalanced data sets by using ensemble models , 2009, 2009 IEEE Symposium on Computational Intelligence and Data Mining.

[13]  Jerzy Stefanowski,et al.  Types of minority class examples and their influence on learning classifiers from imbalanced data , 2015, Journal of Intelligent Information Systems.

[14]  Francisco Herrera,et al.  SMOTE-IPF: Addressing the noisy and borderline examples problem in imbalanced classification by a re-sampling method with filtering , 2015, Inf. Sci..

[15]  Saroj K. Biswas,et al.  Redundancy-driven modified Tomek-link based undersampling: A solution to class imbalance , 2017, Pattern Recognit. Lett..

[16]  Francisco Jesús Martínez-Murcia,et al.  Assisted Diagnosis of Parkinsonism Based on the Striatal Morphology , 2019, Int. J. Neural Syst..

[17]  Jerzy Stefanowski,et al.  Overlapping, Rare Examples and Class Decomposition in Learning Classifiers from Imbalanced Data , 2013 .

[18]  Eyad Elyan,et al.  MFC-GAN: Class-imbalanced dataset classification using Multiple Fake Class Generative Adversarial Network , 2019, Neurocomputing.

[19]  Chumphol Bunkhumpornpat,et al.  DBSMOTE: Density-Based Synthetic Minority Over-sampling TEchnique , 2011, Applied Intelligence.

[20]  Taghi M. Khoshgoftaar,et al.  RUSBoost: A Hybrid Approach to Alleviating Class Imbalance , 2010, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[21]  Chih-Fong Tsai,et al.  Clustering-based undersampling in class-imbalanced data , 2017, Inf. Sci..

[22]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Evolutionary inversion of class distribution in overlapping areas for multi-class imbalanced learning , 2019, Inf. Sci..

[23]  Gustavo E. A. P. A. Batista,et al.  A study of the behavior of several methods for balancing machine learning training data , 2004, SKDD.

[24]  Weidong Zhou,et al.  Epileptic Seizure Detection with EEG Textural Features and Imbalanced Classification Based on EasyEnsemble Learning , 2019, Int. J. Neural Syst..

[25]  Francisco Herrera,et al.  Evolutionary undersampling for imbalanced big data classification , 2015, 2015 IEEE Congress on Evolutionary Computation (CEC).

[26]  J. Bezdek,et al.  FCM: The fuzzy c-means clustering algorithm , 1984 .

[27]  Sabine Van Huffel,et al.  Neonatal Seizure Detection Using Deep Convolutional Neural Networks , 2019, Int. J. Neural Syst..

[28]  Jian Gao,et al.  A new sampling method for classifying imbalanced data based on support vector machine ensemble , 2016, Neurocomputing.

[29]  Alfredo Milani,et al.  An Optimisation-Driven Prediction Method for Automated Diagnosis and Prognosis , 2019, Mathematics.

[30]  Yang Chen,et al.  Automatic Seizure Detection using Fully Convolutional Nested LSTM. , 2020, International journal of neural systems.

[31]  Francisco Chiclana,et al.  Application of uninorms to market basket analysis , 2018, Int. J. Intell. Syst..

[32]  Leonidas J. Guibas,et al.  Volumetric and Multi-view CNNs for Object Classification on 3D Data , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Jesús Alcalá-Fdez,et al.  KEEL Data-Mining Software Tool: Data Set Repository, Integration of Algorithms and Experimental Analysis Framework , 2011, J. Multiple Valued Log. Soft Comput..

[34]  Chumphol Bunkhumpornpat,et al.  DBMUTE: density-based majority under-sampling technique , 2017, Knowledge and Information Systems.

[35]  Loris Nanni,et al.  Coupling different methods for overcoming the class imbalance problem , 2015, Neurocomputing.

[36]  U. Rajendra Acharya,et al.  Deep convolutional neural network for the automated detection and diagnosis of seizure using EEG signals , 2017, Comput. Biol. Medicine.

[37]  Hojjat Adeli,et al.  A New Neural Dynamic Classification Algorithm , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[38]  Seoung Bum Kim,et al.  An overlap-sensitive margin classifier for imbalanced and overlapping data , 2018, Expert Syst. Appl..

[39]  Bidyut Baran Chaudhuri,et al.  Handling data irregularities in classification: Foundations, trends, and future challenges , 2018, Pattern Recognit..

[40]  Yacine Amirat,et al.  Data-Driven Based Approach to Aid Parkinson’s Disease Diagnosis , 2019, Sensors.

[41]  Hojjat Adeli,et al.  Enhanced probabilistic neural network with local decision circles: A robust classifier , 2010, Integr. Comput. Aided Eng..

[42]  U. Rajendra Acharya,et al.  Parkinson's disease: Cause factors, measurable indicators, and early diagnosis , 2018, Comput. Biol. Medicine.