MMD-encouraging convolutional autoencoder: a novel classification algorithm for imbalanced data

Imbalanced data classification problem is widely existed in commercial activities and social production. It refers to the scenarios with considerable gap of sample amount among classes, thus significantly deteriorating the performance of the traditional classification algorithms. The previous dealing methods often focus on resampling and algorithm adjustment, but ignore enhancing the ability of feature learning. In this study, we have proposed a novel algorithm for imbalanced data classification: Maximum Mean Discrepancy-Encouraging Convolutional Autoencoder (MMD-CAE), from the perspective of feature learning. The algorithm adopts a two-phase target training process. The cross entropy loss is employed to calculate reconstruction loss of data, and the Maximum Mean Discrepancy (MMD) with intra-variance constraint is used to stimulate the feature discrepancy in bottleneck layer. By encouraging maximization of MMD between two-class samples, and mapping the original space to a higher dimension space via kernel skills, the features can be learned to form a more effective feature space. The proposed algorithm is tested on ten groups of samples with different imbalance ratios. The performance metrics of recall rate, F1 score, G-means and AUC verify that the proposed algorithm surpasses the existing state-of-the-art methods in this field, also with stronger generalization ability. This study could shed new lights on the related studies in terms of constituting more effective feature space via the proposed MMD with intra-variance constraint method, and the holistic MMD-CAE algorithm for imbalanced data classification.

[1]  Pengjun Zheng,et al.  Determining the Required Probe Vehicle Size for Real-Time Travel Time Estimation on Signalized Arterial , 2019, IEEE Access.

[2]  Sally M. El-Ghamrawy,et al.  An Imbalanced Big Data Mining Framework for Improving Optimization Algorithms Performance , 2019, IEEE Access.

[3]  Aurobinda Routray,et al.  A one-class classification framework using SVDD: Application to an imbalanced geological dataset , 2014, Proceedings of the 2014 IEEE Students' Technology Symposium.

[4]  Lazhar Ben-Brahim,et al.  Mitigation of voltage imbalance in power distribution system using MPC‐controlled packed‐U‐cells converter , 2019, Energy Science & Engineering.

[5]  Shu-Ching Chen,et al.  Dynamic Sampling in Convolutional Neural Networks for Imbalanced Data Classification , 2018, 2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR).

[6]  José Francisco Martínez Trinidad,et al.  SMOTE-D a Deterministic Version of SMOTE , 2016, MCPR.

[7]  Li Shen,et al.  A Sufficient Condition for Convergences of Adam and RMSProp , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Saroj K. Biswas,et al.  Redundancy-driven modified Tomek-link based undersampling: A solution to class imbalance , 2017, Pattern Recognit. Lett..

[9]  Sebastián Maldonado,et al.  Robust classification of imbalanced data using one-class and two-class SVM-based multiclassifiers , 2014, Intell. Data Anal..

[10]  Kitsuchart Pasupa,et al.  Convolutional neural networks based focal loss for class imbalance problem: a case study of canine red blood cells morphology classification , 2020, Journal of Ambient Intelligence and Humanized Computing.

[11]  Kai Liu,et al.  Learning Piece-wise Linear Models from Large Scale Data for Ad Click Prediction , 2017, ArXiv.

[12]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Yijing Li,et al.  Learning from class-imbalanced data: Review of methods and applications , 2017, Expert Syst. Appl..

[14]  Han Liu,et al.  Granular computing-based approach of rule learning for binary classification , 2019 .

[15]  Qing-Long Han,et al.  DeepBalance: Deep-Learning and Fuzzy Oversampling for Vulnerability Detection , 2020, IEEE Transactions on Fuzzy Systems.

[16]  Elke A. Rundensteiner,et al.  Classifying Depression in Imbalanced Datasets Using an Autoencoder- Based Anomaly Detection Approach , 2019, 2019 IEEE 13th International Conference on Semantic Computing (ICSC).

[17]  Vladimir Cherkassky,et al.  Development and Evaluation of Cost-Sensitive Universum-SVM , 2015, IEEE Transactions on Cybernetics.

[18]  Han Liu,et al.  Granular computing-based approach for classification towards reduction of bias in ensemble learning , 2017, GRC 2017.

[19]  Hojjat Adeli,et al.  A New Neural Dynamic Classification Algorithm , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[20]  Gunasekaran Manogaran,et al.  Machine Learning Approach-Based Gamma Distribution for Brain Tumor Detection and Data Sample Imbalance Analysis , 2019, IEEE Access.

[21]  Chen Wang,et al.  Feature Learning With a Divergence-Encouraging Autoencoder for Imbalanced Data Classification , 2018, IEEE Access.

[22]  Han Liu,et al.  Fuzzy rule-based systems for recognition-intensive classification in granular computing context , 2018 .

[23]  Aboozar Taherkhani,et al.  AdaBoost-CNN: An adaptive boosting algorithm for convolutional neural networks to classify multi-class imbalanced datasets using transfer learning , 2020, Neurocomputing.

[24]  Gabriela Meșniță,et al.  Methods of Handling Unbalanced Datasets in Credit Card Fraud Detection , 2020 .

[25]  Yixian Yang,et al.  A Heterogeneous Ensemble Learning Framework for Spam Detection in Social Networks with Imbalanced Data , 2020, Applied Sciences.

[26]  Chia-Yu Hsu,et al.  Conditional Generative Adversarial Network for Defect Classification with Class Imbalance , 2019, 2019 IEEE International Conference on Smart Manufacturing, Industrial & Logistics Engineering (SMILE).

[27]  Han Liu,et al.  Nature-inspired framework of ensemble learning for collaborative classification in granular computing context , 2018, Granular Computing.

[28]  Tomé Almeida Borges,et al.  Ensemble of machine learning algorithms for cryptocurrency investment with different data resampling methods , 2020, Appl. Soft Comput..

[29]  Yen-Liang Chen,et al.  Cost-sensitive decision tree with multiple resource constraints , 2019, Applied Intelligence.

[30]  Christian Maack,et al.  Automated pressure regulation for a silage bagging machine , 2020, Comput. Electron. Agric..

[31]  George D. C. Cavalcanti,et al.  On dynamic ensemble selection and data preprocessing for multi-class imbalance learning , 2018, ArXiv.

[32]  Youlong Yang,et al.  Oversampling technique based on fuzzy representativeness difference for classifying imbalanced data , 2020, Applied Intelligence.

[33]  Patricia Melin,et al.  A new fuzzy learning vector quantization method for classification problems based on a granular approach , 2018, Granular Computing.

[34]  Guo Xie,et al.  An improved ensemble fusion autoencoder model for fault diagnosis from imbalanced and incomplete data , 2020 .

[35]  Yaojun Ding,et al.  Imbalanced network traffic classification based on ensemble feature selection , 2016, 2016 IEEE International Conference on Signal Processing, Communications and Computing (ICSPCC).

[36]  Mayank Sharma,et al.  Twin Neural Networks for the classification of large unbalanced datasets , 2019, Neurocomputing.

[37]  Tor Lattimore,et al.  Cleaning up the neighborhood: A full classification for adversarial partial monitoring , 2018, ALT.

[38]  Prem Shankar Singh Aydav,et al.  Granulation-based self-training for the semi-supervised classification of remote-sensing images , 2020, Granular Computing.

[39]  Chen Wang,et al.  Scalar Quantization as Sparse Least Square Optimization , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  Szymon Wilk,et al.  IIvotes ensemble for imbalanced data , 2012, Intell. Data Anal..

[41]  Zhetao Li,et al.  Weighted and Class-Specific Maximum Mean Discrepancy for Unsupervised Domain Adaptation , 2020, IEEE Transactions on Multimedia.

[42]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[43]  Ma Jun,et al.  EMSGD: An Improved Learning Algorithm of Neural Networks With Imbalanced Data , 2020, IEEE Access.

[44]  Matthias Hein,et al.  Variants of RMSProp and Adagrad with Logarithmic Regret Bounds , 2017, ICML.