Scalable Healthcare Assessment for Diabetic Patients Using Deep Learning on Multiple GPUs

The large-scale parallel computation that became available on the new generation of graphics processing units (GPUs) and on cloud-based services can be exploited for use in healthcare data analysis. Furthermore, computation workstations suited for deep learning are usually equipped with multiple GPUs allowing for workload distribution among multiple GPUs for larger datasets while exploiting parallelism in each GPU. In this paper, we utilize distributed and parallel computation techniques to efficiently analyze healthcare data using deep learning techniques. We demonstrate the scalability and computational benefits of this approach with a case study of longitudinal assessment of approximately 150 000 type 2 diabetic patients. Type 2 diabetes mellitus (T2DM) is the fourth case of mortality worldwide with rising prevalence. T2DM leads to adverse events such as acute myocardial infarction, major amputations, and avoidable hospitalizations. This paper aims to establish a relation between laboratory and medical assessment variables with the occurrence of the aforementioned adverse events and its prediction using machine learning techniques. We use a raw database provided by Basque Health Service, Spain, to conduct this study. This database contains 150 156 patients diagnosed with T2DM, from whom 321 laboratory and medical assessment variables recorded over four years are available. Predictions of adverse events on T2DM patients using both classical machine learning and deep learning techniques were performed and evaluated using accuracy, precision, recall and F1-score as metrics. The best performance for the prediction of acute myocardial infarction is obtained by linear discriminant analysis (LDA) and support vector machines (SVM) both balanced and weight models with an accuracy of 97%; hospital admission for avoidable causes best performance is obtained by LDA balanced and SVMs balanced both with an accuracy of 92%. For the prediction of the incidence of at least one adverse event, the model with the best performance is the recurrent neural network trained with a balanced dataset with an accuracy of 94.6%. The ability to perform and compare these experiments was possible through the use of a workstation with multi-GPUs. This setup allows for scalability to larger datasets. Such models are also cloud ready and can be deployed on similar architectures hosted on AWS for even larger datasets.

[1]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[2]  J. Shaw,et al.  Global estimates of the prevalence of diabetes for 2010 and 2030. , 2010, Diabetes research and clinical practice.

[3]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[4]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[5]  X. Y. Zhang,et al.  Application of support vector machine (SVM) for prediction toxic activity of different data sets. , 2006, Toxicology.

[6]  Till Bärnighausen,et al.  The global economic burden of diabetes in adults aged 20-79 years: a cost-of-illness study. , 2017, The lancet. Diabetes & endocrinology.

[7]  Naveed Sattar,et al.  The changing face of diabetes complications. , 2016, The lancet. Diabetes & endocrinology.

[8]  Vincenzo Lagani,et al.  Development and validation of risk assessment models for diabetes-related complications based on the DCCT/EDIC data. , 2015, Journal of diabetes and its complications.

[9]  M. Woodward,et al.  Association of HbA1c levels with vascular complications and death in patients with type 2 diabetes: evidence of glycaemic thresholds , 2012, Diabetologia.

[10]  Li Wang,et al.  Metabolic Risk Factors of Type 2 Diabetes Mellitus and Correlated Glycemic Control/Complications: A Cross-Sectional Study between Rural and Urban Uygur Residents in Xinjiang Uygur Autonomous Region , 2016, PloS one.

[11]  W. Schurer,et al.  Costs, outcomes and challenges for diabetes care in Spain , 2013, Globalization and Health.

[12]  Edurne Alonso-Morán,et al.  Prevalence and quality of care indicators of type 2 diabetes in the population of the Basque Country (Spain) , 2015 .

[13]  Dimitrios I. Fotiadis,et al.  Machine learning applications in cancer prognosis and prediction , 2014, Computational and structural biotechnology journal.

[14]  D. Christofaro,et al.  Relationship between amputation and risk factors in individuals with diabetes mellitus: A study with Brazilian patients. , 2017, Diabetes & metabolic syndrome.

[15]  N. Keiding,et al.  Importance of control of diabetes in prevention of vascular complications. , 1952, Journal of the American Medical Association.

[16]  C. Mathers,et al.  Projections of Global Mortality and Burden of Disease from 2002 to 2030 , 2006, PLoS medicine.

[17]  Arturo Corbatón Anchuelo,et al.  La diabetes mellitus tipo 2 como enfermedad cardiovascular , 2007 .

[18]  Aurélien Géron,et al.  Hands-On Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems , 2017 .

[19]  Juan F. Orueta,et al.  Prevalence and Costs of Multimorbidity by Deprivation Levels in the Basque Country: A Population Based Study Using Health Administrative Databases , 2014, PloS one.

[20]  Jing Xie,et al.  Combined poor diabetes control indicators are associated with higher risks of diabetic retinopathy and macular edema than poor glycemic control alone , 2017, PloS one.

[21]  L. Rodríguez-Mañas,et al.  Is quality of life different between diabetic and non-diabetic people? The importance of cardiovascular risks , 2017, PloS one.

[22]  Joseph D Conklin,et al.  Applied Logistic Regression:Applied Logistic Regression , 2002 .

[23]  Fei Zou,et al.  Bagging and deep learning in optimal individualized treatment rules , 2019, Biometrics.

[24]  José M. Arteagoitia,et al.  Incidence, prevalence and coronary heart disease risk level in known Type 2 diabetes: a sentinel practice network study in the Basque Country, Spain , 2003, Diabetologia.

[25]  Carl van Walraven,et al.  Using the Johns Hopkins Aggregated Diagnosis Groups (ADGs) to Predict Mortality in a General Adult Population Cohort in Ontario, Canada , 2011, Medical care.

[26]  A Gray,et al.  The impact of diabetes‐related complications on healthcare costs: new results from the UKPDS (UKPDS 84) , 2015, Diabetic medicine : a journal of the British Diabetic Association.

[27]  Min Chen,et al.  Deep Learning for Imbalanced Multimedia Data Classification , 2015, 2015 IEEE International Symposium on Multimedia (ISM).

[28]  D. Hosmer,et al.  Applied Logistic Regression , 1991 .

[29]  Patricia Francis-Lyon,et al.  Applying Deep Learning to Public Health: Using Unbalanced Demographic Data to Predict Thyroid Disorder , 2018, 2018 IEEE 9th Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON).