Important Feature Selection & Accuracy Comparisons of Different Machine Learning Models for Early Diabetes Detection

More than 400 million people in the world have diabetes. High-risk factors of diabetic individuals vary dramatically, and many patients suffer complications and avoidable harm. Improving the identification level of high-risk factors would help to reduce the rate of complications. To do this, it is essential to analyze a person’s medical record, detailed health information that currently requires doctors and is manual, time-consuming, and subjective. In this work, we introduce an approach to automatically predict type 2 diabetes mellitus (T2DM) applying a neural network. The objective of this paper is to find which type of model that works best for predicting diabetes. We used the Pima Indian Diabetes data-set in this analysis. The analysis was carried out on this database using two methods. The first method includes Data Recovery followed by feature selection. We input these features to the MLP neural network classifier which achieved an accuracy of 85.15%. In our second approach, we applied noise reduction based method using k-means followed by feature selection. The features thus obtained are used with Random Forest, Logistic Regression and MLP neural network classifier. The maximum accuracy obtained among these classifiers is 77.08%. The consultation shows why Data recovery with MLP is far better than K-means based noise reduction with the different type of classifier.

[1]  Yan Luo,et al.  GlucoGuide: An Intelligent Type-2 Diabetes Solution Using Data Mining and Mobile Computing , 2014, 2014 IEEE International Conference on Data Mining Workshop.

[2]  Yoichi Hayashi,et al.  Rule extraction using Recursive-Rule extraction algorithm with J48graft combined with sampling selection techniques for the diagnosis of type 2 diabetes mellitus in the Pima Indian dataset , 2016 .

[3]  William C. Chu,et al.  Activity Detection Using Time-Delay Embedding in Multi-modal Sensor System , 2016, ICOST.

[4]  Harichandran Khanna Nehemiah,et al.  A Swarm Optimization approach for clinical knowledge mining , 2015, Comput. Methods Programs Biomed..

[5]  Shengqi Yang,et al.  Type 2 diabetes mellitus prediction model based on data mining , 2018 .

[6]  Farzana Anowar,et al.  A review on diabetes patient lifestyle management using mobile application , 2015, 2015 18th International Conference on Computer and Information Technology (ICCIT).

[7]  Qi Zhang,et al.  Integrating mobile sensing and social network for personalized health-care application , 2015, SAC.

[8]  William C. Chu,et al.  A Novel Real-Time Non-invasive Hemoglobin Level Detection Using Video Images from Smartphone Camera , 2017, 2017 IEEE 41st Annual Computer Software and Applications Conference (COMPSAC).

[9]  Sheikh Iqbal Ahamed,et al.  RGB pixel analysis of fingertip video image captured from sickle cell patient with low and high level of hemoglobin , 2017, 2017 IEEE 8th Annual Ubiquitous Computing, Electronics and Mobile Communication Conference (UEMCON).

[10]  Shankar Kumar,et al.  Classification of Pima indian diabetes dataset using naive bayes with genetic algorithm as an attribute selection , 2016 .

[11]  Jasmine Travers,et al.  A user-centered model for designing consumer mobile health (mHealth) applications (apps) , 2016, J. Biomed. Informatics.

[12]  Sanchita Paul,et al.  GA_MLP NN: A Hybrid Intelligent System for Diabetes Disease Diagnosis , 2016 .

[13]  Blaz Zupan,et al.  Predictive data mining in clinical medicine: Current issues and guidelines , 2008, Int. J. Medical Informatics.

[14]  Chee Peng Lim,et al.  A hybrid intelligent system for medical data classification , 2014, Expert Syst. Appl..

[15]  Aida Mustapha,et al.  Comparison between Neural Networks against Decision Tree in Improving Prediction Accuracy for Diabetes Mellitus , 2011, ICDIPC.

[16]  Ludmil Mikhailov,et al.  Evolving fuzzy medical diagnosis of Pima Indians diabetes and of dermatological diseases , 2010, Artif. Intell. Medicine.

[17]  Amine Chikh,et al.  Design of fuzzy classifier for diabetes disease using Modified Artificial Bee Colony algorithm , 2013, Comput. Methods Programs Biomed..

[18]  Riddhiman Adib,et al.  SmartHeLP: Smartphone-based Hemoglobin Level Prediction Using an Artificial Neural Network , 2018, AMIA.

[19]  Nazmus Sakib,et al.  A Novel Technique of Noninvasive Hemoglobin Level Measurement Using HSV Value of Fingertip Image , 2019, ArXiv.

[20]  Sheikh Iqbal Ahamed,et al.  Pain Level Detection From Facial Image Captured by Smartphone , 2016, J. Inf. Process..

[21]  Rojalina Priyadarshini,et al.  A Novel approach to predict diabetes mellitus using modified Extreme learning machine , 2014, 2014 International Conference on Electronics and Communication Systems (ICECS).

[22]  Sheikh Iqbal Ahamed,et al.  A Novel Activity Detection System Using Plantar Pressure Sensors and Smartphone , 2015, 2015 IEEE 39th Annual Computer Software and Applications Conference.

[23]  Sheikh Iqbal Ahamed,et al.  Smartphone-based Human Hemoglobin Level Measurement Analyzing Pixel Intensity of a Fingertip Video on Different Color Spaces , 2017 .

[24]  T. Yıldırım,et al.  MEDICAL DIAGNOSIS ON PIMA INDIAN DIABETES USING GENERAL REGRESSION NEURAL NETWORKS , 2003 .

[25]  Novruz Allahverdi,et al.  Design of a hybrid system for the diabetes and heart diseases , 2008, Expert Syst. Appl..

[26]  Curtis Gittens,et al.  Post-diagnosis management of diabetes through a mobile health consultation application , 2014, 2014 IEEE 16th International Conference on e-Health Networking, Applications and Services (Healthcom).