A bi-objective hybrid optimization algorithm to reduce noise and data dimension in diabetes diagnosis using support vector machines

Abstract Diabetes mellitus is a medical condition examined by data miners for reasons such as significant health complications in affected people, the economic impact on healthcare networks, and so on. In order to find the main causes of this disease, researchers look into the patient's lifestyle, hereditary information, etc. The goal of data mining in this context is to find patterns that make early detection of the disease and proper treatment easier. Due to the high volume of data involved in therapeutic contexts and disease diagnosis, provision of the intended treatment method become almost impossible over a short period of time. This justifies the use of pre-processing techniques and data reduction methods in such contexts. In this regard, clustering and meta-heuristic algorithms maintain important roles. In this paper, a method based on the k-means clustering algorithm is first utilized to detect and delete outliers. Then, in order to select significant and effective features, four bi-objective meta-heuristic algorithms are employed to choose the least number of significant features with the highest classification accuracy using support vector machines (SVM). In addition, the 10-fold cross validation (CV) method is used to validate the constructed model. Using real case data, it is concluded that the multi-objective firefly (MOFA) and multi-objective imperialist competitive algorithm (MOICA) with a 100% classification accuracy outperform the non-dominated sorting genetic algorithm (NSGA-II) and multi-objective particle swarm optimization (MOPSO) with the accuracies of 98.2% and 94.6%, respectively.

[1]  Ferat Sahin,et al.  A survey on feature selection methods , 2014, Comput. Electr. Eng..

[2]  Hamidreza Maghsoudlou,et al.  Multi-skilled project scheduling with level-dependent rework risk; three multi-objective mechanisms based on cuckoo search , 2017, Appl. Soft Comput..

[3]  Nihat Yilmaz,et al.  A New Data Preparation Method Based on Clustering Algorithms for Diagnosis Systems of Heart and Diabetes Diseases , 2014, Journal of Medical Systems.

[4]  John H. Holland,et al.  Genetic Algorithms and the Optimal Allocation of Trials , 1973, SIAM J. Comput..

[5]  Luigi P. Cordella,et al.  A novel mutation operator for the evolutionary learning of Bayesian networks , 2008, ICPR 2008.

[6]  Christopher J. C. Burges,et al.  A Tutorial on Support Vector Machines for Pattern Recognition , 1998, Data Mining and Knowledge Discovery.

[7]  Xiaobing Yu,et al.  Evaluating Multiobjective Evolutionary Algorithms Using MCDM Methods , 2018 .

[8]  Jung-Chien Chen,et al.  Predictors of diabetes remission after bariatric surgery in Asia. , 2012, Asian journal of surgery.

[9]  Xiangyu Li,et al.  Davies Bouldin Index based hierarchical initialization K-means , 2017, Intell. Data Anal..

[10]  Mostafa Zandieh,et al.  A multi objective optimization approach for flexible job shop scheduling problem under random machine breakdown by evolutionary algorithms , 2016, Comput. Oper. Res..

[11]  Jessica M. Rudd Application of support vector machine modeling and graph theory metrics for disease classification , 2018, Model. Assist. Stat. Appl..

[12]  Christos Schizas,et al.  Region based Support Vector Machine algorithm for medical diagnosis on Pima Indian Diabetes dataset , 2012, 2012 IEEE 12th International Conference on Bioinformatics & Bioengineering (BIBE).

[13]  Swati V. Shinde,et al.  SVM based Diabetic Classification and Hospital Recommendation , 2017 .

[14]  Mohammad Karim Sohrabi,et al.  Multi-objective feature selection for warfarin dose prediction , 2017, Comput. Biol. Chem..

[15]  Genichii Taguchi,et al.  Introduction to quality engineering. designing quality into products a , 1986 .

[16]  T. Santhanam,et al.  Application of K-Means and Genetic Algorithms for Dimension Reduction by Integrating SVM for Diabetes Diagnosis , 2015 .

[17]  Leslie S. Smith,et al.  Feature subset selection in large dimensionality domains , 2010, Pattern Recognit..

[18]  Caro Lucas,et al.  Imperialist competitive algorithm: An algorithm for optimization inspired by imperialistic competition , 2007, 2007 IEEE Congress on Evolutionary Computation.

[19]  M.F. Alamaireh,et al.  A Predictive Neural Network Control Approach in Diabetes Management by Insulin Administration , 2006, 2006 2nd International Conference on Information & Communication Technologies.

[20]  El-Ghazali Talbi,et al.  Comparison of population based metaheuristics for feature selection: Application to microarray data classification , 2008, 2008 IEEE/ACS International Conference on Computer Systems and Applications.

[21]  Sergei Vassilvitskii,et al.  Local Search Methods for k-Means with Outliers , 2017, Proc. VLDB Endow..

[22]  V. Rajasekaran,et al.  A Multi-objective Placement of Phasor Measurement Units Considering Observability and Measurement Redundancy Using Firefly Algorithm , 2015 .

[23]  Nor Ashidi Mat Isa,et al.  Intelligent Medical Disease Diagnosis Using Improved Hybrid Genetic Algorithm - Multilayer Perceptron Network , 2013, Journal of Medical Systems.

[24]  Suwarno Suwarno,et al.  Diagnosis of Diabetes using Support Vector Machines with Radial Basis Function Kernels , 2016 .

[25]  S. J. Mousavirad,et al.  Feature selection using modified imperialist competitive algorithm , 2013, ICCKE 2013.

[26]  Jake A. Carter,et al.  Combining elemental analysis of toenails and machine learning techniques as a non-invasive diagnostic tool for the robust classification of type-2 diabetes , 2019, Expert Syst. Appl..

[27]  Anil Kapur,et al.  Economic analysis of diabetes care. , 2007, The Indian journal of medical research.

[28]  Xiangyang Wang,et al.  Feature selection based on rough sets and particle swarm optimization , 2007, Pattern Recognit. Lett..

[29]  Finale Doshi-Velez,et al.  Mind the Gap: A Generative Approach to Interpretable Feature Selection and Extraction , 2015, NIPS.

[30]  Pieter Gillard,et al.  Stem-cell-based Therapies for Improving Islet Transplantation Outcomes in Type 1 Diabetes. , 2017, Current diabetes reviews.

[31]  I. Takei,et al.  Novel statistical classification model of type 2 diabetes mellitus patients for tailor-made prevention using data mining algorithm. , 2002, Journal of epidemiology.

[32]  Norman D. Black,et al.  Feature selection and classification model construction on type 2 diabetic patients' data , 2007, Artif. Intell. Medicine.

[33]  Mohammed Aladeemy,et al.  A new hybrid approach for feature selection and support vector machine model selection based on self-adaptive cohort intelligence , 2017, Expert Syst. Appl..

[34]  Somula Ramasubbareddy,et al.  Classification of Heart Disease Using Support Vector Machine , 2019, Journal of Computational and Theoretical Nanoscience.

[35]  G. Manikandan,et al.  A Survey on Feature Selection and Extraction Techniques for High-Dimensional Microarray Datasets , 2018 .

[36]  Indranil Bose Data Mining in Diabetes Diagnosis and Detection , 2005 .

[37]  James Brusey,et al.  Linear dimensionality reduction for classification via a sequential Bayes error minimisation with an application to flow meter diagnostics , 2018, Expert Syst. Appl..

[38]  David E. Goldberg,et al.  Genetic algorithms and Machine Learning , 1988, Machine Learning.

[39]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[40]  Russell C. Eberhart,et al.  A new optimizer using particle swarm theory , 1995, MHS'95. Proceedings of the Sixth International Symposium on Micro Machine and Human Science.

[41]  Claudio De Stefano,et al.  A feature selection algorithm for class discrimination improvement , 2007, 2007 IEEE International Geoscience and Remote Sensing Symposium.

[42]  Chee Peng Lim,et al.  A hybrid intelligent system for medical data classification , 2014, Expert Syst. Appl..

[43]  Hema Banati,et al.  Fire Fly Based Feature Selection Approach , 2011 .

[44]  Xin-She Yang,et al.  Nature-Inspired Metaheuristic Algorithms , 2008 .

[45]  Murali S. Shanker,et al.  Using Neural Networks To Predict the Onset of Diabetes Mellitus , 1996, J. Chem. Inf. Comput. Sci..

[46]  Abdelkamel Tari,et al.  Dimensionality reduction in data mining: A Copula approach , 2016, Expert Syst. Appl..

[47]  Francisco Herrera,et al.  Big data preprocessing: methods and prospects , 2016 .