A multi-class classification model for supporting the diagnosis of type II diabetes mellitus

Background Numerous studies have utilized machine-learning techniques to predict the early onset of type 2 diabetes mellitus. However, fewer studies have been conducted to predict an appropriate diagnosis code for the type 2 diabetes mellitus condition. Further, ensemble techniques such as bagging and boosting have likewise been utilized to an even lesser extent. The present study aims to identify appropriate diagnosis codes for type 2 diabetes mellitus patients by means of building a multi-class prediction model which is both parsimonious and possessing minimum features. In addition, the importance of features for predicting diagnose code is provided. Methods This study included 149 patients who have contracted type 2 diabetes mellitus. The sample was collected from a large hospital in Taiwan from November, 2017 to May, 2018. Machine learning algorithms including instance-based, decision trees, deep neural network, and ensemble algorithms were all used to build the predictive models utilized in this study. Average accuracy, area under receiver operating characteristic curve, Matthew correlation coefficient, macro-precision, recall, weighted average of precision and recall, and model process time were subsequently used to assess the performance of the built models. Information gain and gain ratio were used in order to demonstrate feature importance. Results The results showed that most algorithms, except for deep neural network, performed well in terms of all performance indices regardless of either the training or testing dataset that were used. Ten features and their importance to determine the diagnosis code of type 2 diabetes mellitus were identified. Our proposed predictive model can be further developed into a clinical diagnosis support system or integrated into existing healthcare information systems. Both methods of application can effectively support physicians whenever they are diagnosing type 2 diabetes mellitus patients in order to foster better patient-care planning.

[1]  Abhijit Ghatak,et al.  Machine Learning with R , 2017, Springer Singapore.

[2]  Ligang Zhou,et al.  Predicting the listing status of Chinese listed companies with multi-class classification models , 2016, Inf. Sci..

[3]  Manuel Rodríguez Tablado,et al.  Development and validation of various phenotyping algorithms for Diabetes Mellitus using data from electronic health records , 2017, Comput. Methods Programs Biomed..

[4]  Gema García-Sáez,et al.  Artificial Intelligence Methodologies and Their Application to Diabetes , 2018, Journal of diabetes science and technology.

[5]  Kazuhiko Ohe,et al.  Development of Type 2 Diabetes Mellitus Phenotyping Framework Using Expert Knowledge and Machine Learning Approach , 2017, Journal of diabetes science and technology.

[6]  C. Dolea,et al.  World Health Organization , 1949, International Organization.

[7]  Amir Talaei-Khoei,et al.  Identifying people at risk of developing type 2 diabetes: A comparison of predictive analytics techniques and predictor variables , 2018, Int. J. Medical Informatics.

[8]  S. Yuan,et al.  Cigarette smoking as a risk factor for type 2 diabetes in women compared with men: a systematic review and meta-analysis of prospective cohort studies. , 2018, Journal of public health.

[9]  Frank B Hu,et al.  Metabolomics in Prediabetes and Diabetes: A Systematic Review and Meta-analysis , 2016, Diabetes Care.

[10]  Patanjali Kashyap,et al.  Industrial Applications of Machine Learning , 2017 .

[11]  I. Vlahavas,et al.  Machine Learning and Data Mining Methods in Diabetes Research , 2017, Computational and structural biotechnology journal.

[12]  J. Ross Quinlan,et al.  Improved Use of Continuous Attributes in C4.5 , 1996, J. Artif. Intell. Res..

[13]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[14]  Bernd Bischl,et al.  mlr: Machine Learning in R , 2016, J. Mach. Learn. Res..

[15]  A. E. Maxwell Comparing the Classification of Subjects by Two Independent Judges , 1970, British Journal of Psychiatry.

[16]  Tom Fawcett,et al.  Data science for business , 2013 .

[17]  Shengqi Yang,et al.  Type 2 diabetes mellitus prediction model based on data mining , 2018 .

[18]  Ayman El-Baz,et al.  Comparative approaches for classification of diabetes mellitus data: Machine learning paradigm , 2017, Comput. Methods Programs Biomed..

[19]  A. Stuart A TEST FOR HOMOGENEITY OF THE MARGINAL DISTRIBUTIONS IN A TWO-WAY CLASSIFICATION , 1955 .

[20]  Q. Mcnemar Note on the sampling error of the difference between correlated proportions or percentages , 1947, Psychometrika.

[21]  S SuriJasjit,et al.  Comparative approaches for classification of diabetes mellitus data , 2017 .

[22]  Manal Alghamdi,et al.  Predicting diabetes mellitus using SMOTE and ensemble machine learning approach: The Henry Ford ExercIse Testing (FIT) project , 2017, PloS one.

[23]  Yang Gong,et al.  Accurate and rapid screening model for potential diabetes mellitus , 2019, BMC Medical Informatics and Decision Making.

[24]  Guy Lapalme,et al.  A systematic analysis of performance measures for classification tasks , 2009, Inf. Process. Manag..

[25]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[26]  Mehrbakhsh Nilashi,et al.  Accuracy Improvement for Diabetes Disease Classification: A Case on a Public Medical Dataset , 2017 .

[27]  P. O S I T I O N S T A T E M E N T,et al.  Diagnosis and Classification of Diabetes Mellitus , 2011, Diabetes Care.

[28]  Giuseppe Ciaburro,et al.  Neural Networks with R: Smart models using CNN, RNN, deep learning, and artificial intelligence principles , 2017 .

[29]  Taweh Beysolow,et al.  Introduction to Deep Learning Using R , 2017 .

[30]  Dennis H. Murphree,et al.  Automated Diabetes Case Identification Using Electronic Health Record Data at a Tertiary Care Facility , 2017, Mayo Clinic proceedings. Innovations, quality & outcomes.

[31]  A. C. Faul A Concise Introduction to Machine Learning , 2019 .

[32]  R. Eckel,et al.  Obesity and type 2 diabetes: what can be unified and what needs to be individualized? , 2011, The Journal of clinical endocrinology and metabolism.

[33]  N. D. Lewis,et al.  Deep Learning Made Easy with R: A Gentle Introduction For Data Science , 2016 .

[34]  Luís Torgo,et al.  UBL: an R package for Utility-based Learning , 2016, ArXiv.

[35]  W. Cefalu,et al.  Standards of Medical Care in Diabetes—2018 Abridged for Primary Care Providers , 2018, Clinical Diabetes.

[36]  Max Kuhn,et al.  Applied Predictive Modeling , 2013 .

[37]  François Laviolette,et al.  Domain-Adversarial Training of Neural Networks , 2015, J. Mach. Learn. Res..

[38]  Andrea Cherrington,et al.  Standards of Medical Care in Diabetes—2017 Abridged for Primary Care Providers , 2017, Clinical Diabetes.