Analysis of diabetes mellitus for early prediction using optimal features selection

Diabetes is a chronic disease or group of metabolic disease where a person suffers from an extended level of blood glucose in the body, which is either the insulin production is inadequate, or because the body’s cells do not respond properly to insulin. The constant hyperglycemia of diabetes is related to long-haul harm, brokenness, and failure of various organs, particularly the eyes, kidneys, nerves, heart, and veins. The objective of this research is to make use of significant features, design a prediction algorithm using Machine learning and find the optimal classifier to give the closest result comparing to clinical outcomes. The proposed method aims to focus on selecting the attributes that ail in early detection of Diabetes Miletus using Predictive analysis. The result shows the decision tree algorithm and the Random forest has the highest specificity of 98.20% and 98.00%, respectively holds best for the analysis of diabetic data. Naïve Bayesian outcome states the best accuracy of 82.30%. The research also generalizes the selection of optimal features from dataset to improve the classification accuracy.

[1]  K. Gangopadhyay,et al.  Consensus Statement on Dose Modifications of Antidiabetic Agents in Patients with Hepatic Impairment , 2017, Indian journal of endocrinology and metabolism.

[2]  Pat Langley,et al.  Models of Incremental Concept Formation , 1990, Artif. Intell..

[3]  W. James,et al.  A life course approach to diet, nutrition and the prevention of chronic diseases , 2004, Public Health Nutrition.

[4]  M. Gomes,et al.  Impact of Diabetes on Cardiovascular Disease: An Update , 2013, International journal of hypertension.

[5]  William M. Campbell,et al.  Support vector machines for speaker verification and identification , 2000, Neural Networks for Signal Processing X. Proceedings of the 2000 IEEE Signal Processing Society Workshop (Cat. No.00TH8501).

[6]  Yunman Li,et al.  Functional herbal food ingredients used in type 2 diabetes mellitus , 2012, Pharmacognosy reviews.

[7]  S. Grundy,et al.  Diabetes and cardiovascular disease: a statement for healthcare professionals from the American Heart Association. , 1999, Circulation.

[8]  Saman Hina,et al.  Analyzing Diabetes Datasets using Data Mining , 2017 .

[9]  Keith G. Tolman,et al.  Spectrum of Liver Disease in Type 2 Diabetes and Management of Patients With Diabetes and Liver Disease , 2007 .

[10]  B. Metzger,et al.  Hyperglycemia and Adverse Pregnancy Outcomes. , 2019, Clinical chemistry.

[11]  V. Basevi Standards of Medical Care in Diabetes—2011 , 2011, Diabetes Care.

[12]  Davis,et al.  Principles of Data Mining , 2001 .

[13]  Jae Won Lee,et al.  An extensive comparison of recent classification tools applied to microarray data , 2004, Comput. Stat. Data Anal..

[14]  H. P. van der Ploeg,et al.  Sociodemographic Correlates of the Increasing Trend in Prevalence of Gestational Diabetes Mellitus in a Large Population of Women Between 1995 and 2005 , 2008, Diabetes Care.

[15]  Cynthia R. Marling,et al.  A Machine Learning Approach to Predicting Blood Glucose Levels for Diabetes Management , 2014, AAAI Workshop: Modern Artificial Intelligence for Health Analytics.

[16]  Patrick Haffner,et al.  Support vector machines for histogram-based image classification , 1999, IEEE Trans. Neural Networks.

[17]  Ronald L. Rivest,et al.  Inferring Decision Trees Using the Minimum Description Length Principle , 1989, Inf. Comput..

[18]  A. Hingorani,et al.  Type 2 diabetes mellitus after gestational diabetes: a systematic review and meta-analysis , 2009, The Lancet.

[19]  Jaakko Tuomilehto,et al.  The diabetes-cardiovascular risk paradox: results from a Finnish population-based prospective study. , 2008, European heart journal.

[20]  Heikki Mannila,et al.  Principles of Data Mining , 2001, Undergraduate Topics in Computer Science.

[21]  Tomasz Imielinski,et al.  An Interval Classifier for Database Mining Applications , 1992, VLDB.

[22]  Antonella Afeltra,et al.  Diabetes in chronic liver disease: from old concepts to new evidence , 2006, Diabetes/metabolism research and reviews.

[23]  Kemal Polat,et al.  A cascade learning system for classification of diabetes disease: Generalized Discriminant Analysis and Least Square Support Vector Machine , 2008, Expert Syst. Appl..

[24]  Piero Avogaro,et al.  Associazione di iperlipemia, diabete mellito e obesita' di medio grado , 1967, Acta diabetologia latina.

[25]  Niall M. Adams,et al.  Data Mining for Fun and Profit , 2000 .

[26]  J. Claverie,et al.  The significance of digital gene expression profiles. , 1997, Genome research.

[27]  U. Rajendra Acharya,et al.  Algorithms for the Automated Detection of Diabetic Retinopathy Using Digital Fundus Images: A Review , 2012, Journal of Medical Systems.

[28]  V. Veena Vijayan,et al.  Prediction and diagnosis of diabetes mellitus — A machine learning approach , 2015, 2015 IEEE Recent Advances in Intelligent Computational Systems (RAICS).

[29]  Subhankar Chowdhury,et al.  Approaches in type 1 diabetes research: A status report , 2009, International journal of diabetes in developing countries.

[30]  I. Vlahavas,et al.  Machine Learning and Data Mining Methods in Diabetes Research , 2017, Computational and structural biotechnology journal.

[31]  Somula Ramasubbareddy,et al.  Classification of Heart Disease Using Support Vector Machine , 2019, Journal of Computational and Theoretical Nanoscience.

[32]  M. Collins,et al.  Phylogenetic heterogeneity of the genus Bacillus revealed by comparative analysis of small‐subunit‐ribosomal RNA sequences , 1991 .

[33]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[34]  P. Zimmet,et al.  Definition, diagnosis and classification of diabetes mellitus and its complications. Part 1: diagnosis and classification of diabetes mellitus. Provisional report of a WHO Consultation , 1998, Diabetic medicine : a journal of the British Diabetic Association.

[35]  Joyce C Niland,et al.  Human pancreatic islets and diabetes research. , 2009, JAMA.

[36]  Ralph B D'Agostino,et al.  Risk Variable Clustering in the Insulin Resistance Syndrome: The Framingham Offspring Study , 1997, Diabetes.

[37]  Adrian E. Raftery,et al.  Bayesian model averaging: development of an improved multi-class, gene selection and classification tool for microarray data , 2005, Bioinform..