Diabetes Detection Models in Mexican Patients by Combining Machine Learning Algorithms and Feature Selection Techniques for Clinical and Paraclinical Attributes: A Comparative Evaluation

The development of medical diagnostic models to support healthcare professionals has witnessed remarkable growth in recent years. Among the prevalent health conditions affecting the global population, diabetes stands out as a significant concern. In the domain of diabetes diagnosis, machine learning algorithms have been widely explored for generating disease detection models, leveraging diverse datasets primarily derived from clinical studies. The performance of these models heavily relies on the selection of the classifier algorithm and the quality of the dataset. Therefore, optimizing the input data by selecting relevant features becomes essential for accurate classification. This research presents a comprehensive investigation into diabetes detection models by integrating two feature selection techniques: the Akaike information criterion and genetic algorithms. These techniques are combined with six prominent classifier algorithms, including support vector machine, random forest, k-nearest neighbor, gradient boosting, extra trees, and naive Bayes. By leveraging clinical and paraclinical features, the generated models are evaluated and compared to existing approaches. The results demonstrate superior performance, surpassing accuracies of 94%. Furthermore, the use of feature selection techniques allows for working with a reduced dataset. The significance of feature selection is underscored in this study, showcasing its pivotal role in enhancing the performance of diabetes detection models. By judiciously selecting relevant features, this approach contributes to the advancement of medical diagnostic capabilities and empowers healthcare professionals in making informed decisions regarding diabetes diagnosis and treatment.

[1]  Hang Li,et al.  Artificial intelligence and blockchain technology for secure smart grid and power distribution Automation: A State-of-the-Art Review , 2023, Sustainable Energy Technologies and Assessments.

[2]  Maad M. Mijwil,et al.  Involving machine learning techniques in heart disease diagnosis: a performance analysis , 2023, International Journal of Electrical and Computer Engineering (IJECE).

[3]  Shaohua Qi,et al.  Application of artificial intelligence in diagnosis and treatment of colorectal cancer: A novel Prospect , 2023, Frontiers in Medicine.

[4]  Ding-Yang Hsu,et al.  Predicting the Onset of Diabetes with Machine Learning Methods , 2023, Journal of personalized medicine.

[5]  Mohammad W. Elbes,et al.  Diabetes Monitoring System in Smart Health Cities Based on Big Data Intelligence , 2023, Future Internet.

[6]  A. Al-Alawi,et al.  Application of Big Data and Artificial Intelligence in Pilot Training: A Systematic Literature Review , 2023, 2023 International Conference On Cyber Management And Engineering (CyMaEn).

[7]  Shigao Huang,et al.  Artificial intelligence in lung cancer diagnosis and prognosis:current application and future perspective. , 2023, Seminars in cancer biology.

[8]  Limin Zhang,et al.  Research Progress of Respiratory Disease and Idiopathic Pulmonary Fibrosis Based on Artificial Intelligence , 2023, Diagnostics.

[9]  S. Vermeire,et al.  Evaluating the potential of artificial intelligence in ulcerative colitis , 2023, Expert review of gastroenterology & hepatology.

[10]  Jingsong Li,et al.  Prediction of New-Onset Diabetes After Pancreatectomy With Subspace Clustering Based Multi-View Feature Selection , 2023, IEEE Journal of Biomedical and Health Informatics.

[11]  Omar Ali,et al.  A systematic literature review of artificial intelligence in the healthcare sector: Benefits, challenges, methodologies, and functionalities , 2023, Journal of Innovation & Knowledge.

[12]  Muhammad Usama Islam,et al.  Review on the Evaluation and Development of Artificial Intelligence for COVID-19 Containment , 2023, Sensors.

[13]  Law Kumar Singh,et al.  Artificial intelligence based medical decision support system for early and accurate breast cancer prediction , 2023, Adv. Eng. Softw..

[14]  R. Reimer,et al.  Artificial Intelligence in Ophthalmology – Status Quo and Future Perspectives , 2022, Seminars in ophthalmology.

[15]  H. Ashrafian,et al.  Perception of artificial intelligence-based solutions in healthcare among people with and without diabetes: a cross-sectional survey from the Health in Central Denmark cohort , 2022, Diabetes Epidemiology and Management.

[16]  Thomas W Sanchez,et al.  The prospects of artificial intelligence in urban planning , 2022, International Journal of Urban Sciences.

[17]  A. Hamisu,et al.  A Framework for Selection of Machine Learning Algorithms Based on Performance Metrices and Akaike Information Criteria in Healthcare, Telecommunication, and Marketing Sector , 2022, Machine Learning and Deep Learning in Medical Data Analytics and Healthcare Applications.

[18]  S. Arnab,et al.  Power to the Teachers: An Exploratory Review on Artificial Intelligence in Education , 2021, Inf..

[19]  L. Low,et al.  Evaluation of Machine Learning Methods Developed for Prediction of Diabetes Complications: A Systematic Review , 2021, Journal of diabetes science and technology.

[20]  Wei Li,et al.  Vehicle Artificial Intelligence System Based on Intelligent Image Analysis and 5G Network , 2021, International Journal of Wireless Information Networks.

[21]  J. Corchado,et al.  Green Artificial Intelligence: Towards an Efficient, Sustainable and Equitable Technology for Smart Cities and Futures , 2021, Sustainability.

[22]  Antonio A. Aguileta,et al.  Children’s Activity Classification for Domestic Risk Scenarios Using Environmental Sound and a Bayesian Network , 2021, Healthcare.

[23]  Seyed Taghi Akhavan Niaki,et al.  A k-NN method for lung cancer prognosis with the use of a genetic algorithm for feature selection , 2021, Expert Syst. Appl..

[24]  Dalwinder Singh,et al.  Investigating the impact of data normalization on classification performance , 2020, Appl. Soft Comput..

[25]  Kwok-Leung Tsui,et al.  Lifespan prediction of lithium-ion batteries based on various extracted features and gradient boosting regression tree model , 2020 .

[26]  Jalaluddin Khan,et al.  Intelligent Machine Learning Approach for Effective Recognition of Diabetes in E-Healthcare Using Clinical Data , 2020, Sensors.

[27]  Bingjie Chai,et al.  Diagnostic classification of cancers using extreme gradient boosting algorithm and multi-omics data , 2020, Comput. Biol. Medicine.

[28]  Gopi Battineni,et al.  Applications of Machine Learning Predictive Models in the Chronic Disease Diagnosis , 2020, Journal of personalized medicine.

[29]  Kai Xu,et al.  Prediction and analysis of net ecosystem carbon exchange based on gradient boosting regression and random forest , 2020 .

[30]  Laura A. Zanella-Calzada,et al.  Feature Selection Using Genetic Algorithms for the Generation of a Recognition and Classification of Children Activities Model Using Environmental Sound , 2020, Mob. Inf. Syst..

[31]  Johannes L. Schönberger,et al.  SciPy 1.0: fundamental algorithms for scientific computing in Python , 2019, Nature Methods.

[32]  Laura A. Zanella-Calzada,et al.  Evaluation of Five Classifiers for Children Activity Recognition with Sound as Information Source and Akaike Criterion for Feature Selection , 2019, MCPR.

[33]  G. Montouris,et al.  Development of a classifier to identify patients with probable Lennox–Gastaut syndrome in health insurance claims databases via random forest methodology , 2019, Current medical research and opinion.

[34]  N. Sneha,et al.  Analysis of diabetes mellitus for early prediction using optimal features selection , 2019, Journal of Big Data.

[35]  C. Galván-Tejada,et al.  Identification of Diabetic Patients through Clinical and Para-Clinical Features in Mexico: An Approach Using Deep Neural Networks , 2019, International journal of environmental research and public health.

[36]  Antonio García-Domínguez,et al.  Comparación del nivel de precisión de los clasificadores Support Vector Machines, k Nearest Neighbors, Random Forests, Extra Trees y Gradient Boosting en el reconocimiento de actividades infantiles utilizando sonido ambiental , 2018, Res. Comput. Sci..

[37]  L. Nelson Sanchez-Pinto,et al.  Comparison of variable selection methods for clinical predictive modeling , 2018, Int. J. Medical Informatics.

[38]  Shulin Wang,et al.  Feature selection in machine learning: A new perspective , 2018, Neurocomputing.

[39]  Ali Ouni,et al.  Optimal Deep Learning LSTM Model for Electric Load Forecasting using Feature Selection and Genetic Algorithm: Comparison with Machine Learning Approaches † , 2018, Energies.

[40]  V. Vasanthi,et al.  Machine Learning Algorithms with ROC Curve for Predicting and Diagnosing the Heart Disease , 2018, Soft Computing and Medical Bioinformatics.

[41]  Xi Zhu,et al.  Random forest based classification of alcohol dependence patients and healthy controls using resting state MRI , 2018, Neuroscience Letters.

[42]  Stefano Nembrini,et al.  The revival of the Gini importance? , 2018, Bioinform..

[43]  Martin Kappas,et al.  Comparison of Random Forest, k-Nearest Neighbor, and Support Vector Machine Classifiers for Land Cover Classification Using Sentinel-2 Imagery , 2017, Sensors.

[44]  Md Taufeeq Uddin,et al.  Human activity recognition from wearable sensors using extremely randomized trees , 2015, 2015 International Conference on Electrical Engineering and Information Communication Technology (ICEEICT).

[45]  Xiongfei Li,et al.  The use of ROC and AUC in the validation of objective image fusion evaluation metrics , 2015, Signal Process..

[46]  Nikola Bogunovic,et al.  A review of feature selection methods with applications , 2015, 2015 38th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO).

[47]  Alois Knoll,et al.  Gradient boosting machines, a tutorial , 2013, Front. Neurorobot..

[48]  David Barber,et al.  Bayesian reasoning and machine learning , 2012 .

[49]  David M. W. Powers,et al.  Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation , 2011, ArXiv.

[50]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[51]  Kevin B. Korb,et al.  Bayesian Artificial Intelligence, Second Edition , 2010 .

[52]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[53]  Richard E. Neapolitan,et al.  Learning Bayesian networks , 2007, KDD '07.

[54]  Francesco Falciani,et al.  GALGO: an R package for multivariate variable selection using genetic algorithms , 2006, Bioinform..

[55]  Pierre Geurts,et al.  Extremely randomized trees , 2006, Machine Learning.

[56]  Charles X. Ling,et al.  Using AUC and accuracy in evaluating learning algorithms , 2005, IEEE Transactions on Knowledge and Data Engineering.

[57]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[58]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[59]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[60]  L. Breiman Random Forests , 2001, Encyclopedia of Machine Learning and Data Mining.

[61]  Andrew P. Bradley,et al.  The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..

[62]  Kenneth A. De Jong,et al.  Genetic algorithms as a tool for feature selection in machine learning , 1992, Proceedings Fourth International Conference on Tools with Artificial Intelligence TAI '92.

[63]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[64]  H. Akaike A new look at the statistical model identification , 1974 .

[65]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[66]  Jaswinder Singh,et al.  Comparative analysis of predictive machine learning algorithms for diabetes mellitus , 2023, Bulletin of Electrical Engineering and Informatics.

[67]  Arif I. Sarwat,et al.  Unified Univariate-Neural Network Models for Lithium-Ion Battery State-of-Charge Forecasting Using Minimized Akaike Information Criterion Algorithm , 2021, IEEE Access.

[68]  Ali Kashif Bashir,et al.  Medical Diagnosis Using Machine Learning: A Statistical Review , 2021, Computers, Materials & Continua.

[69]  Juli Katon Criterion , 2021, Encyclopedia of Autism Spectrum Disorders.

[70]  J. Vogel,et al.  Model Selection And Multimodel Inference , 2016 .

[71]  Ferat Sahin,et al.  A survey on feature selection methods , 2014, Comput. Electr. Eng..

[72]  Jinsong Leng,et al.  A genetic Algorithm-Based feature selection , 2014 .

[73]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[74]  Ashis Pradhan,et al.  SUPPORT VECTOR MACHINE-A Survey , 2012 .

[75]  Lekha Bhambhu,et al.  DATA CLASSIFICATION USING SUPPORT VECTOR MACHINE , 2009 .

[76]  Jonathan L. Shapiro,et al.  Genetic Algorithms in Machine Learning , 2001, Machine Learning and Its Applications.

[77]  David R. Anderson,et al.  Bayesian Methods in Cosmology: Model selection and multi-model inference , 2009 .

[78]  石黒 真木夫,et al.  Akaike information criterion statistics , 1986 .

[79]  F. Wilcoxon,et al.  Probability tables for individual comparisons by ranking methods. , 1947, Biometrics.