Using Cost-Sensitive Learning and Feature Selection Algorithms to Improve the Performance of Imbalanced Classification

Imbalanced data problem is widely present in network intrusion detection, spam filtering, biomedical engineering, finance, science, being a challenge in many real-life data-intensive applications. Classifier bias occurs when traditional classification algorithms are used to deal with imbalanced data. As already known, the General Vector Machine (GVM) algorithm has good generalization ability, though it does not work well for the imbalanced classification. Additionally, the state-of-the-art Binary Ant Lion Optimizer (BALO) algorithm has high exploitability and fast convergence rate. Based on these facts, we have proposed in this paper a Cost-sensitive Feature selection General Vector Machine (CFGVM) algorithm based on GVM and BALO algorithms to tackle the imbalanced classification problem, delivering different cost weights to different classes of samples. In our method, the BALO algorithm determines the cost weights and extract more significant features to improve the classification performance. Experiments conducted on eleven imbalanced data sets have shown that the CFGVM algorithm significantly improves the classification performance of minority class samples. By comparing with similar algorithms and state-of-the-art algorithms, the proposed algorithm significantly outperforms in performance and produces better classification results.

[1]  MengChu Zhou,et al.  A Distance-Based Weighted Undersampling Scheme for Support Vector Machines and its Application to Imbalanced Classification , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[2]  José Francisco Martínez Trinidad,et al.  Study of the impact of resampling methods for contrast pattern based classifiers in imbalanced databases , 2016, Neurocomputing.

[3]  Kuan-Ching Li,et al.  FPGA-based approximate calculation system of General Vector Machine , 2019, Microelectron. J..

[4]  Hong Zhao,et al.  A Local Field Correlated and Monte Carlo Based Shallow Neural Network Model for Nonlinear Time Series Prediction , 2016, EAI Endorsed Trans. Scalable Inf. Syst..

[5]  Iman Nekooeimehr,et al.  Adaptive semi-unsupervised weighted oversampling (A-SUWO) for imbalanced datasets , 2016, Expert Syst. Appl..

[6]  Byoung-Tak Zhang,et al.  Ensemble Learning with Active Example Selection for Imbalanced Biomedical Data Classification , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[7]  Kuan-Ching Li,et al.  A novel approach for mobile malware classification and detection in Android systems , 2018, Multimedia Tools and Applications.

[8]  Zhelong Wang,et al.  Mixed-kernel based weighted extreme learning machine for inertial sensor based human activity recognition with imbalanced dataset , 2016, Neurocomputing.

[9]  Hadi Sadoghi Yazdi,et al.  Online cost-sensitive neural network classifiers for non-stationary and imbalanced data streams , 2012, Neural Computing and Applications.

[10]  Hien M. Nguyen,et al.  Borderline over-sampling for imbalanced data classification , 2009, Int. J. Knowl. Eng. Soft Data Paradigms.

[11]  Francisco Herrera,et al.  Fuzzy rough classifiers for class imbalanced multi-instance data , 2016, Pattern Recognit..

[12]  Gerald Schaefer,et al.  Cost-sensitive decision tree ensembles for effective imbalanced classification , 2014, Appl. Soft Comput..

[13]  Safdar Ali,et al.  Can-CSC-GBE: Developing Cost-sensitive Classifier with Gentleboost Ensemble for breast cancer classification using protein amino acids and imbalanced data , 2016, Comput. Biol. Medicine.

[14]  Fang Feng,et al.  The application of a novel neural network in the detection of phishing websites , 2018, J. Ambient Intell. Humaniz. Comput..

[15]  Hong Zhao,et al.  General Vector Machine , 2016, ArXiv.

[16]  Nathalie Japkowicz,et al.  Boosting support vector machines for imbalanced data sets , 2008, Knowledge and Information Systems.

[17]  MengChu Zhou,et al.  An embedded feature selection method for imbalanced data classification , 2019, IEEE/CAA Journal of Automatica Sinica.

[18]  Ciza Thomas,et al.  Improving intrusion detection for imbalanced network traffic , 2013, Secur. Commun. Networks.

[19]  R. Monsefi,et al.  Class imbalance handling using wrapper-based random oversampling , 2012, 20th Iranian Conference on Electrical Engineering (ICEE2012).

[20]  Qingguo Zhou,et al.  Derivative-based acceleration of general vector machine , 2019, Soft Comput..

[21]  Sai-Ho Ling,et al.  A hybrid evolutionary preprocessing method for imbalanced datasets , 2018, Inf. Sci..

[22]  Kay Chen Tan,et al.  Evolutionary Cluster-Based Synthetic Oversampling Ensemble (ECO-Ensemble) for Imbalance Learning , 2017, IEEE Transactions on Cybernetics.

[23]  MengChu Zhou,et al.  A Noise-Filtered Under-Sampling Scheme for Imbalanced Classification , 2017, IEEE Transactions on Cybernetics.

[24]  Antonio J. Rivera,et al.  Training algorithms for Radial Basis Function Networks to tackle learning processes with imbalanced data-sets , 2014, Appl. Soft Comput..

[25]  Jing Zhang,et al.  Cost-Sensitive Large margin Distribution Machine for classification of imbalanced data , 2016, Pattern Recognit. Lett..

[26]  Qingguo Zhou,et al.  A research of Monte Carlo optimized neural network for electricity load forecast , 2019, The Journal of Supercomputing.

[27]  Yan-Qing Zhang,et al.  Robust multiclass classification for learning from imbalanced biomedical data , 2012 .

[28]  Seetha Hari,et al.  Learning From Imbalanced Data , 2019, Advances in Computer and Electrical Engineering.

[29]  Zuohua Ding,et al.  A Spammer Identification Method for Class Imbalanced Weibo Datasets , 2019, IEEE Access.

[30]  Juan José Rodríguez Diez,et al.  Random Balance: Ensembles of variable priors classifiers for imbalanced data , 2015, Knowl. Based Syst..

[31]  Antônio de Pádua Braga,et al.  Novel Cost-Sensitive Approach to Improve the Multilayer Perceptron Performance on Imbalanced Data , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[32]  Ekrem Duman,et al.  A cost-sensitive decision tree approach for fraud detection , 2013, Expert Syst. Appl..

[33]  Swagatam Das,et al.  Near-Bayesian Support Vector Machines for imbalanced data classification with equal or unequal misclassification costs , 2015, Neural Networks.

[34]  Francisco Herrera,et al.  Cost-sensitive linguistic fuzzy rule based classification systems under the MapReduce framework for imbalanced big data , 2015, Fuzzy Sets Syst..

[35]  Björn E. Ottersten,et al.  Cost Sensitive Credit Card Fraud Detection Using Bayes Minimum Risk , 2013, 2013 12th International Conference on Machine Learning and Applications.

[36]  Pradeep Jangir,et al.  Multi-objective ant lion optimizer: a multi-objective optimization algorithm for solving engineering problems , 2016, Applied Intelligence.

[37]  Haizhou Li,et al.  A Cost-Sensitive Deep Belief Network for Imbalanced Classification , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[38]  Yang Wang,et al.  Cost-sensitive boosting for classification of imbalanced data , 2007, Pattern Recognit..

[39]  Aboul Ella Hassanien,et al.  Binary ant lion approaches for feature selection , 2016, Neurocomputing.

[40]  Zhe Wang,et al.  IMCStacking: Cost-sensitive stacking learning with feature inverse mapping for imbalanced problems , 2018, Knowl. Based Syst..

[41]  Seyed Mohammad Mirjalili,et al.  The Ant Lion Optimizer , 2015, Adv. Eng. Softw..

[42]  Francisco Herrera,et al.  SMOTE-IPF: Addressing the noisy and borderline examples problem in imbalanced classification by a re-sampling method with filtering , 2015, Inf. Sci..

[43]  Francisco Herrera,et al.  A Review on Ensembles for the Class Imbalance Problem: Bagging-, Boosting-, and Hybrid-Based Approaches , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[44]  Qingguo Zhou,et al.  Neural network model with Monte Carlo algorithm for electricity demand forecasting in Queensland , 2017, ACSW.