Medical imbalanced data classification

A R T I C L E I N F O A B S T R A C T Article history: Received: 19 March, 2017 Accepted: 04 April, 2017 Online: 15 April, 2017 In general, the imbalanced dataset is a problem often found in health applications. In medical data classification, we often face the imbalanced number of data samples where at least one of the classes constitutes only a very small minority of the data. In the same time, it represent a difficult problem in most of machine learning algorithms. There have been many works dealing with classification of imbalanced dataset. In this paper, we proposed a learning method based on a cost sensitive extension of Least Mean Square (LMS) algorithm that penalizes errors of different samples with different weights and some rules of thumb to determine those weights. After the balancing phase, we apply the different techniques (Support Vector Machine [SVM], KNearest Neighbor [K-NN] and Multilayer perceptron [MLP]) for the balanced datasets. We have also compared the obtained results before and after balancing method. We have obtained best results compared to literature with a classification accuracy of 100%.

[1]  C. Lee Giles,et al.  Learning on the border: active learning in imbalanced data classification , 2007, CIKM '07.

[2]  Zahir Tari,et al.  KRNN: k Rare-class Nearest Neighbour classification , 2017, Pattern Recognit..

[3]  Ken Chen,et al.  Efficient Classification of Multi-label and Imbalanced Data using Min-Max Modular Classifiers , 2006, The 2006 IEEE International Joint Conference on Neural Network Proceedings.

[4]  Yanqing Zhang,et al.  Granular SVM with Repetitive Undersampling for Highly Imbalanced Protein Homology Prediction , 2006, 2006 IEEE International Conference on Granular Computing.

[5]  Sheng Chen,et al.  A Kernel-Based Two-Class Classifier for Imbalanced Data Sets , 2007, IEEE Transactions on Neural Networks.

[6]  Yang Wang,et al.  Boosting for Learning Multiple Classes with Imbalanced Class Distribution , 2006, Sixth International Conference on Data Mining (ICDM'06).

[7]  Taeho Jo,et al.  Class imbalances versus small disjuncts , 2004, SKDD.

[8]  Rohini K. Srihari,et al.  Feature selection for text categorization on imbalanced data , 2004, SKDD.

[9]  Francisco Herrera,et al.  A Review on Ensembles for the Class Imbalance Problem: Bagging-, Boosting-, and Hybrid-Based Approaches , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[10]  María Pérez-Ortiz,et al.  Borderline Kernel Based Over-Sampling , 2013, HAIS.

[11]  Yuan-Hai Shao,et al.  An efficient weighted Lagrangian twin support vector machine for imbalanced data classification , 2014, Pattern Recognit..

[12]  Angelia Melani Adrian,et al.  Breast Cancer Classification Using Hybrid Synthetic Minority OverSampling Technique and Artificial Immune Recognition System Algorithm , 2013 .

[13]  Gary M. Weiss Mining with rarity: a unifying framework , 2004, SKDD.

[14]  Edward Y. Chang,et al.  KBA: kernel boundary alignment considering imbalanced data distribution , 2005, IEEE Transactions on Knowledge and Data Engineering.

[15]  Zhi-Hua Zhou,et al.  Exploratory Under-Sampling for Class-Imbalance Learning , 2006, ICDM.

[16]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[17]  Jesus A. Gonzalez,et al.  Machine Learning for Imbalanced Datasets: Application in Medical Diagnostic , 2006, FLAIRS.

[18]  Haibo He,et al.  Imbalanced evolving self-organizing learning , 2014, Neurocomputing.

[19]  Yanling Li,et al.  Data Imbalance Problem in Text Classification , 2010, 2010 Third International Symposium on Information Processing.

[20]  Haibo He,et al.  RAMOBoost: Ranked Minority Oversampling in Boosting , 2010, IEEE Transactions on Neural Networks.

[21]  Ciza Thomas,et al.  Improving intrusion detection for imbalanced network traffic , 2013, Secur. Commun. Networks.

[22]  Hadi Sadoghi Yazdi,et al.  Ensemble of online neural networks for non-stationary and imbalanced data streams , 2013, Neurocomputing.

[23]  Johan L. Perols Financial Statement Fraud Detection: An Analysis of Statistical and Machine Learning Algorithms , 2011 .

[24]  Roberto D'Ambrosio,et al.  Handling imbalanced datasets by reconstruction rules in decomposition schemes. (Classification de bases de données déséquilibrées par des règles de décomposition) , 2014 .

[25]  Kai Ming Ting,et al.  A Comparative Study of Cost-Sensitive Boosting Algorithms , 2000, ICML.

[26]  Shankar M. Krishnan,et al.  Kernel Machines for Imbalanced Data Problem in Biomedical Applications , 2014 .

[27]  Safdar Ali,et al.  Can-CSC-GBE: Developing Cost-sensitive Classifier with Gentleboost Ensemble for breast cancer classification using protein amino acids and imbalanced data , 2016, Comput. Biol. Medicine.

[28]  A. Hero,et al.  Regularized Least-Mean-Square Algorithms , 2010, 1012.5066.

[29]  Jason Weston,et al.  Fast Kernel Classifiers with Online and Active Learning , 2005, J. Mach. Learn. Res..

[30]  Igor Kononenko,et al.  Cost-Sensitive Learning with Neural Networks , 1998, ECAI.

[31]  Yang Wang,et al.  Cost-sensitive boosting for classification of imbalanced data , 2007, Pattern Recognit..

[32]  Zhi-Hua Zhou,et al.  Ieee Transactions on Knowledge and Data Engineering 1 Training Cost-sensitive Neural Networks with Methods Addressing the Class Imbalance Problem , 2022 .

[33]  T. Fillon Traitement numérique du signal acoustique pour une aide aux malentendants , 2004 .

[34]  Mario Vento,et al.  Reliability Parameters to Improve Combination Strategies in Multi-Expert Systems , 1999, Pattern Analysis & Applications.

[35]  Annarita D'Addabbo,et al.  Parallel selective sampling method for imbalanced and large data classification , 2015, Pattern Recognit. Lett..

[36]  Jakub M. Tomczak,et al.  Boosted SVM with active learning strategy for imbalanced data , 2014, Soft Computing.

[37]  Yanchun Liang,et al.  A resampling ensemble algorithm for classification of imbalance problems , 2014, Neurocomputing.

[38]  David A. Cieslak,et al.  Combating imbalance in network intrusion datasets , 2006, 2006 IEEE International Conference on Granular Computing.

[39]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[40]  Jingbo Zhu,et al.  Active Learning for Word Sense Disambiguation with Methods for Addressing the Class Imbalance Problem , 2007, EMNLP.

[41]  Nitesh V. Chawla,et al.  SMOTEBoost: Improving Prediction of the Minority Class in Boosting , 2003, PKDD.

[42]  Seetha Hari,et al.  Learning From Imbalanced Data , 2019, Advances in Computer and Electrical Engineering.

[43]  Pedro Antonio Gutiérrez,et al.  Sensitivity Versus Accuracy in Multiclass Problems Using Memetic Pareto Evolutionary Neural Networks , 2010, IEEE Transactions on Neural Networks.

[44]  Mohammed Amine Chikh,et al.  Comparative study of balancing methods: case of imbalanced medical data , 2016 .

[45]  Gunnar Rätsch,et al.  An introduction to kernel-based learning algorithms , 2001, IEEE Trans. Neural Networks.

[46]  Jing Zhang,et al.  Cost-Sensitive Large margin Distribution Machine for classification of imbalanced data , 2016, Pattern Recognit. Lett..

[47]  M. Maloof Learning When Data Sets are Imbalanced and When Costs are Unequal and Unknown , 2003 .

[48]  Nicolás García-Pedrajas,et al.  Constructing Ensembles of Classifiers by Means of Weighted Instance Selection , 2009, IEEE Transactions on Neural Networks.

[49]  Haibo He,et al.  ADASYN: Adaptive synthetic sampling approach for imbalanced learning , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[50]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[51]  Hui Han,et al.  Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning , 2005, ICIC.

[52]  Haydemar Núñez,et al.  GSVM: An SVM for handling imbalanced accuracy between classes inbi-classification problems , 2014, Appl. Soft Comput..

[53]  Kun-Huang Chen,et al.  A hybrid classifier combining Borderline-SMOTE with AIRS algorithm for estimating brain metastasis from lung cancer: A case study in Taiwan , 2015, Comput. Methods Programs Biomed..

[54]  Sebastián Ventura,et al.  Weighted Data Gravitation Classification for Standard and Imbalanced Data , 2013, IEEE Transactions on Cybernetics.

[55]  D. K. U. Rani,et al.  Evaluation of Classifiers Performance using Resampling on Breast cancer Data , 2015 .