Integrating TANBN with cost sensitive classification algorithm for imbalanced data in medical diagnosis

Abstract For the imbalanced classification problems, most traditional classification models only focus on searching for an excellent classifier to maximize classification accuracy with the fixed misclassification cost, not take into consideration that misclassification cost can change with sample probability distribution. So far as we know, cost-sensitive learning method can be effectively utilized to solve imbalanced data classification problems. In this regards, we propose an integrated TANBN with cost-sensitive classification algorithm (AdaC-TANBN) to overcome the above drawback and improve classification accuracy. The AdaC-TANBN algorithm employs variable misclassification cost determined by samples distribution probability to train classifier, then implements classification for imbalanced data in medical diagnosis. The effectiveness of our proposed approach is examined on the Cleveland heart dataset (Heart), Indian liver patient dataset (ILPD), Dermatology dataset and Cervical cancer risk factors dataset (CCRF) from the UCI learning repository. The experimental results indicate that the AdaC-TANBN algorithm can outperform other state-of-the-art comparative methods.

[1]  Zhengang Jiang,et al.  Imbalanced biomedical data classification using self-adaptive multilayer ELM combined with dynamic GAN , 2018, BioMedical Engineering OnLine.

[2]  Swagatam Das,et al.  Near-Bayesian Support Vector Machines for imbalanced data classification with equal or unequal misclassification costs , 2015, Neural Networks.

[3]  ShangJennifer,et al.  Learning from class-imbalanced data , 2017 .

[4]  Zhelong Wang,et al.  Mixed-kernel based weighted extreme learning machine for inertial sensor based human activity recognition with imbalanced dataset , 2016, Neurocomputing.

[5]  Yaping Lin,et al.  Synthetic minority oversampling technique for multiclass imbalance problems , 2017, Pattern Recognit..

[6]  Marcos André Gonçalves,et al.  A Genetic Programming approach for feature selection in highly dimensional skewed data , 2018, Neurocomputing.

[7]  Limin Wang,et al.  Structure Extension of Tree-Augmented Naive Bayes , 2019, Entropy.

[8]  Mahmoud El-Banna,et al.  Modified Mahalanobis Taguchi System for Imbalance Data Classification , 2017, Comput. Intell. Neurosci..

[9]  Stefan Wermter,et al.  Towards Effective Classification of Imbalanced Data with Convolutional Neural Networks , 2016, ANNPR.

[10]  Francisco Herrera,et al.  SMOTE-IPF: Addressing the noisy and borderline examples problem in imbalanced classification by a re-sampling method with filtering , 2015, Inf. Sci..

[11]  Jens Ledet Jensen,et al.  High dimensional classifiers in the imbalanced case , 2016, Comput. Stat. Data Anal..

[12]  Bo Gao,et al.  A novel intelligent classification model for breast cancer diagnosis , 2019, Inf. Process. Manag..

[13]  Nic Herndon,et al.  A Study of Domain Adaptation Classifiers Derived From Logistic Regression for the Task of Splice Site Prediction , 2016, IEEE Transactions on NanoBioscience.

[14]  Xia Hong,et al.  Construction of Neurofuzzy Models For Imbalanced Data Classification , 2014, IEEE Transactions on Fuzzy Systems.

[15]  Wan Kyun Chung,et al.  Training-Free Bayesian Self-Adaptive Classification for sEMG Pattern Recognition Including Motion Transition , 2020, IEEE Transactions on Biomedical Engineering.

[16]  Jing Cheng,et al.  Affective detection based on an imbalanced fuzzy support vector machine , 2015, Biomed. Signal Process. Control..

[17]  V. Sharma,et al.  Automated Classification of Fatty and Normal Liver Ultrasound Images Based on Mutual Information Feature Selection , 2018 .

[18]  Wei Li,et al.  nsemble-based hybrid probabilistic sampling for imbalanced data earning in lung nodule CAD , 2014 .

[19]  Shu-Ching Chen,et al.  Automatic Video Event Detection for Imbalance Data Using Enhanced Ensemble Deep Learning , 2017, Int. J. Semantic Comput..

[20]  Björn E. Ottersten,et al.  Cost Sensitive Credit Card Fraud Detection Using Bayes Minimum Risk , 2013, 2013 12th International Conference on Machine Learning and Applications.

[21]  Taghi M. Khoshgoftaar,et al.  RUSBoost: A Hybrid Approach to Alleviating Class Imbalance , 2010, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[22]  Xin Yao,et al.  Dynamic Sampling Approach to Training Neural Networks for Multiclass Imbalance Classification , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[23]  Yong Zhang,et al.  Imbalanced data classification based on scaling kernel-based support vector machine , 2014, Neural Computing and Applications.

[24]  Jing Xia,et al.  Class Weights Random Forest Algorithm for Processing Class Imbalanced Medical Data , 2018, IEEE Access.

[25]  Xin Yao,et al.  Diversity analysis on imbalanced data sets by using ensemble models , 2009, 2009 IEEE Symposium on Computational Intelligence and Data Mining.

[26]  Richard Weber,et al.  Feature selection for high-dimensional class-imbalanced data sets using Support Vector Machines , 2014, Inf. Sci..

[27]  Yong Luo,et al.  Cost-Sensitive Feature Selection by Optimizing F-Measures , 2018, IEEE Transactions on Image Processing.

[28]  Daniel S. Yeung,et al.  Diversified Sensitivity-Based Undersampling for Imbalance Classification Problems , 2015, IEEE Transactions on Cybernetics.

[29]  Safdar Ali,et al.  Can-CSC-GBE: Developing Cost-sensitive Classifier with Gentleboost Ensemble for breast cancer classification using protein amino acids and imbalanced data , 2016, Comput. Biol. Medicine.

[30]  Yuming Zhou,et al.  A novel ensemble method for classifying imbalanced data , 2015, Pattern Recognit..

[31]  Jin-Hui Zhu,et al.  Online Feature Selection of Class Imbalance via PA Algorithm , 2016, Journal of Computer Science and Technology.

[32]  Xuehua Wang,et al.  Feature selection for high-dimensional imbalanced data , 2013, Neurocomputing.

[33]  Murat Karabatak,et al.  A new classifier for breast cancer detection based on Naïve Bayesian , 2015 .

[34]  Kok-Leong Ong,et al.  Feature selection for high dimensional imbalanced class data using harmony search , 2017, Eng. Appl. Artif. Intell..

[35]  Weiyi Liu,et al.  Naïve Bayesian Classification of Uncertain Objects Based on the Theory of Interval Probability , 2016, Int. J. Artif. Intell. Tools.

[36]  Gerald Schaefer,et al.  Cost-sensitive decision tree ensembles for effective imbalanced classification , 2014, Appl. Soft Comput..

[37]  Rok Blagus,et al.  SMOTE for high-dimensional class-imbalanced data , 2013, BMC Bioinformatics.

[38]  Jia Song,et al.  A bi-directional sampling based on K-means method for imbalance text classification , 2016, 2016 IEEE/ACIS 15th International Conference on Computer and Information Science (ICIS).

[39]  Yang-Lang Chang,et al.  A new complement naïve Bayesian approach for biomedical data classification , 2018, J. Ambient Intell. Humaniz. Comput..

[40]  Yunming Ye,et al.  ForesTexter: An efficient random forest algorithm for imbalanced text categorization , 2014, Knowl. Based Syst..

[41]  P. N. Suganthan,et al.  An approach for classification of highly imbalanced data using weighting and undersampling , 2010, Amino Acids.

[42]  Yijing Li,et al.  Learning from class-imbalanced data: Review of methods and applications , 2017, Expert Syst. Appl..

[43]  Simon Fong,et al.  Improving the classification performance of biological imbalanced datasets by swarm optimization algorithms , 2016, The Journal of Supercomputing.

[44]  Yang Wang,et al.  Cost-sensitive boosting for classification of imbalanced data , 2007, Pattern Recognit..

[45]  R. J. Kuo,et al.  Integrating cluster analysis with granular computing for imbalanced data classification problem - A case study on prostate cancer prognosis , 2018, Comput. Ind. Eng..

[46]  Salvatore J. Stolfo,et al.  AdaCost: Misclassification Cost-Sensitive Boosting , 1999, ICML.

[47]  Jiancheng Shi,et al.  Constraining the water imbalance in a land data assimilation system through a recursive assimilation scheme , 2016, 2016 IEEE International Geoscience and Remote Sensing Symposium (IGARSS).

[48]  Xindong Wu,et al.  Online feature selection for high-dimensional class-imbalanced data , 2017, Knowl. Based Syst..

[49]  Fulufhelo Vincent Nelwamondo,et al.  Applying Cost-Sensitive Classification for Financial Fraud Detection under High Class-Imbalance , 2014, 2014 IEEE International Conference on Data Mining Workshop.

[50]  Annarita D'Addabbo,et al.  Parallel selective sampling method for imbalanced and large data classification , 2015, Pattern Recognit. Lett..

[51]  Jung-Hwan Oh,et al.  Similarity-Based Active Learning for Image Classification Under Class Imbalance , 2018, 2018 IEEE International Conference on Data Mining (ICDM).

[52]  Jiehua Wu,et al.  A generalized tree augmented naive Bayes link prediction model , 2018, J. Comput. Sci..

[53]  Durga Toshniwal,et al.  Online sparse class imbalance learning on big data , 2016, Neurocomputing.

[54]  Vaibhav Rajan,et al.  ICU Mortality Prediction: A Classification Algorithm for Imbalanced Datasets , 2017, AAAI.

[55]  Chih-Fong Tsai,et al.  Clustering-based undersampling in class-imbalanced data , 2017, Inf. Sci..

[56]  Alfredo Petrosino,et al.  Adjusted F-measure and kernel scaling for imbalanced data learning , 2014, Inf. Sci..

[57]  T. Warren Liao,et al.  Classification of weld flaws with imbalanced class data , 2008, Expert Syst. Appl..

[58]  Man Leung Wong,et al.  Cost-sensitive ensemble of stacked denoising autoencoders for class imbalance problems in business domain , 2020, Expert Syst. Appl..

[59]  Francisco Herrera,et al.  A Review on Ensembles for the Class Imbalance Problem: Bagging-, Boosting-, and Hybrid-Based Approaches , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[60]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[61]  Zhiping Lin,et al.  Efficient representation learning for high-dimensional imbalance data , 2016, 2016 IEEE International Conference on Digital Signal Processing (DSP).

[62]  Hendry,et al.  Deep learning to predict user rating in imbalance classification data incorporating ensemble methods , 2018, 2018 IEEE International Conference on Applied System Invention (ICASI).

[63]  Maher Maalouf,et al.  Computational Statistics and Data Analysis Robust Weighted Kernel Logistic Regression in Imbalanced and Rare Events Data , 2022 .

[64]  Nitesh V. Chawla,et al.  SMOTEBoost: Improving Prediction of the Minority Class in Boosting , 2003, PKDD.