Envelope Imbalance Learning Algorithm based on Multilayer Fuzzy C-means Clustering and Minimum Interlayer discrepancy

Imbalanced learning is important and challenging since the problem of the classification of imbalanced datasets is prevalent in machine learning and data mining fields. Sampling approaches are proposed to address this issue, and cluster-based oversampling methods have shown great potential as they aim to simultaneously tackle between-class and within-class imbalance issues. However, all existing clustering methods are based on a one-time approach. Due to the lack of a priori knowledge, improper setting of the number of clusters often exists, which leads to poor clustering performance. Besides, the existing methods are likely to generate noisy instances. To solve these problems, this paper proposes a deep instance envelope network-based imbalanced learning algorithm with the multilayer fuzzy c-means (MlFCM) and a minimum interlayer discrepancy mechanism based on the maximum mean discrepancy (MIDMD). This algorithm can guarantee high quality balanced instances using a deep instance envelope network in the absence of prior knowledge. First, the MlFCM is designed for the original minority class instances to obtain deep instances and increase the diversity of instances. Then, the MIDMD is proposed to avoid the generation of noisy instances and maintain the consistency of the interlayers of instances. Next, the multilayer FCM and minimum interlayer discrepancy mechanism are combined to construct a deep instance envelope network – the MlFC&IDMD. Finally, an imbalance learning algorithm is proposed based on the MlFC&IDMD. In the experimental section, thirty-three popular public datasets are used for verification, and over ten representative algorithms are used for comparison. The experimental results show that the proposed approach significantly outperforms other popular methods.

[1]  Michael I. Jordan,et al.  Learning Transferable Features with Deep Adaptation Networks , 2015, ICML.

[2]  Francisco Herrera,et al.  An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics , 2013, Inf. Sci..

[3]  Alfredo Milani,et al.  An Optimisation-Driven Prediction Method for Automated Diagnosis and Prognosis , 2019, Mathematics.

[4]  Christopher Leckie,et al.  High-dimensional and large-scale anomaly detection using a linear one-class SVM with deep learning , 2016, Pattern Recognit..

[5]  Francisco Herrera,et al.  Evolutionary-based selection of generalized instances for imbalanced classification , 2012, Knowl. Based Syst..

[6]  Taghi M. Khoshgoftaar,et al.  RUSBoost: A Hybrid Approach to Alleviating Class Imbalance , 2010, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[7]  Hamido Fujita,et al.  Imbalanced enterprise credit evaluation with DTE-SBD: Decision tree ensemble based on SMOTE and bagging with differentiated sampling rates , 2018, Inf. Sci..

[8]  Chih-Fong Tsai,et al.  Clustering-based undersampling in class-imbalanced data , 2017, Inf. Sci..

[9]  Diane J. Cook,et al.  wRACOG: A Gibbs Sampling-Based Oversampling Technique , 2013, 2013 IEEE 13th International Conference on Data Mining.

[10]  George Forman,et al.  An Extensive Empirical Study of Feature Selection Metrics for Text Classification , 2003, J. Mach. Learn. Res..

[11]  Yue-Shi Lee,et al.  Cluster-based under-sampling approaches for imbalanced data distributions , 2009, Expert Syst. Appl..

[12]  Huaxiang Zhang,et al.  RWO-Sampling: A random walk over-sampling approach to imbalanced data classification , 2014, Inf. Fusion.

[13]  Hui Han,et al.  Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning , 2005, ICIC.

[14]  Cheng-Lin Liu,et al.  Deep Neural Network Self-Distillation Exploiting Data Representation Invariance , 2020, IEEE Transactions on Neural Networks and Learning Systems.

[15]  Leonidas J. Guibas,et al.  Volumetric and Multi-view CNNs for Object Classification on 3D Data , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Fernando Bação,et al.  Oversampling for Imbalanced Learning Based on K-Means and SMOTE , 2017, Inf. Sci..

[17]  Chumphol Bunkhumpornpat,et al.  Safe-Level-SMOTE: Safe-Level-Synthetic Minority Over-Sampling TEchnique for Handling the Class Imbalanced Problem , 2009, PAKDD.

[18]  Qing Li,et al.  Adaptive weighted over-sampling for imbalanced datasets based on density peaks clustering with heuristic filtering , 2020, Inf. Sci..

[19]  Qinghua Cao,et al.  Applying Over-sampling Technique Based on Data Density and Cost-sensitive SVM to Imbalanced Learning , 2011, 2011 International Conference on Information Management, Innovation Management and Industrial Engineering.

[20]  Lior Rokach,et al.  Fast-CBUS: A fast clustering-based undersampling method for addressing the class imbalance problem , 2017, Neurocomputing.

[21]  Francisco Chiclana,et al.  Application of uninorms to market basket analysis , 2018, Int. J. Intell. Syst..

[22]  Emilio Corchado,et al.  A survey of multiple classifier systems as hybrid systems , 2014, Inf. Fusion.

[23]  Nitesh V. Chawla,et al.  SMOTEBoost: Improving Prediction of the Minority Class in Boosting , 2003, PKDD.

[24]  Sai-Ho Ling,et al.  A hybrid evolutionary preprocessing method for imbalanced datasets , 2018, Inf. Sci..

[25]  Witold Pedrycz,et al.  Hierarchical System Modeling , 2018, IEEE Transactions on Fuzzy Systems.

[26]  Dimitris Kanellopoulos,et al.  Handling imbalanced datasets: A review , 2006 .

[27]  Fernando Bacao,et al.  Self-Organizing Map Oversampling (SOMO) for imbalanced data set learning , 2017, Expert Syst. Appl..

[28]  S. Holm A Simple Sequentially Rejective Multiple Test Procedure , 1979 .

[29]  Yaguo Lei,et al.  A Polynomial Kernel Induced Distance Metric to Improve Deep Transfer Learning for Fault Diagnosis of Machines , 2020, IEEE Transactions on Industrial Electronics.

[30]  Eyad Elyan,et al.  Improved Overlap-based Undersampling for Imbalanced Dataset Classification with Application to Epilepsy and Parkinson's Disease , 2020, Int. J. Neural Syst..

[31]  Chun-Xia Zhang,et al.  A Novel Selective Ensemble Algorithm for Imbalanced Data Classification Based on Exploratory Undersampling , 2014 .

[32]  Hossein Nezamabadi-pour,et al.  CDBH: A clustering and density-based hybrid approach for imbalanced data classification , 2021, Expert Syst. Appl..

[33]  Chumphol Bunkhumpornpat,et al.  DBSMOTE: Density-Based Synthetic Minority Over-sampling TEchnique , 2011, Applied Intelligence.

[34]  Odile Papini,et al.  Information Fusion , 2014, Computer Vision, A Reference Guide.

[35]  Witold Pedrycz,et al.  Hierarchical Granular Clustering: An Emergence of Information Granules of Higher Type and Higher Order , 2015, IEEE Transactions on Fuzzy Systems.

[36]  Xin Yao,et al.  MWMOTE--Majority Weighted Minority Oversampling Technique for Imbalanced Data Set Learning , 2014 .

[37]  Xu-Ying Liu,et al.  Transfer synthetic over-sampling for class-imbalance learning with limited minority class data , 2019, Frontiers of Computer Science.

[38]  Taeho Jo,et al.  Class imbalances versus small disjuncts , 2004, SKDD.

[39]  LeeYue-Shi,et al.  Cluster-based under-sampling approaches for imbalanced data distributions , 2009 .

[40]  Iman Nekooeimehr,et al.  Adaptive semi-unsupervised weighted oversampling (A-SUWO) for imbalanced datasets , 2016, Expert Syst. Appl..

[41]  Francisco Herrera,et al.  A Review on Ensembles for the Class Imbalance Problem: Bagging-, Boosting-, and Hybrid-Based Approaches , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[42]  Seetha Hari,et al.  Learning From Imbalanced Data , 2019, Advances in Computer and Electrical Engineering.

[43]  Sanyam Shukla,et al.  UnderBagging based reduced Kernelized weighted extreme learning machine for class imbalance learning , 2018, Eng. Appl. Artif. Intell..

[44]  X. W. Liang,et al.  LR-SMOTE - An improved unbalanced data set oversampling based on K-means and SVM , 2020, Knowl. Based Syst..

[45]  J. Bezdek,et al.  FCM: The fuzzy c-means clustering algorithm , 1984 .

[46]  Francisco Herrera,et al.  Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental analysis of power , 2010, Inf. Sci..

[47]  Haibo He,et al.  ADASYN: Adaptive synthetic sampling approach for imbalanced learning , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[48]  Md Zahidul Islam,et al.  Novel algorithms for cost-sensitive classification and knowledge discovery in class imbalanced datasets with an application to NASA software defects , 2018, Inf. Sci..

[49]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[50]  Li Zhu,et al.  Data Mining on Imbalanced Data Sets , 2008, 2008 International Conference on Advanced Computer Theory and Engineering.

[51]  David A. Cieslak,et al.  Combating imbalance in network intrusion datasets , 2006, 2006 IEEE International Conference on Granular Computing.

[52]  Alexander J. Smola,et al.  Distribution Matching for Transduction , 2009, NIPS.

[53]  Rajendra Akerkar,et al.  Knowledge Based Systems , 2017, Encyclopedia of GIS.

[54]  Bernhard Schölkopf,et al.  A Kernel Approach to Comparing Distributions , 2007, AAAI.