A predictive analytics framework for identifying patients at risk of developing multiple medical complications caused by chronic diseases

Chronic diseases often cause several medical complications. This paper aims to predict multiple complications among patients with a chronic disease. The literature uses single-task learning algorithms to predict complications independently and assumes no correlation among complications of chronic diseases. We propose two methods (independent prediction of complications with single-task learning and concurrent prediction of complications with multi-task learning) and show that medical complications of chronic diseases can be correlated. We use a case study and compare the performance of these two methods by predicting complications of hypertrophic cardiomyopathy on 106 predictors in 1078 electronic medical records from April 2009-April 2017, inclusive. The methods are implemented using logistic regression, artificial neural networks, decision trees, and support vector machines. The results show multi-task learning with logistic regression improves the performance of predictions in terms of both discrimination and calibration.

[1]  Seokho Kang,et al.  Personalized prediction of drug efficacy for diabetes treatment via patient-level sequential modeling with neural networks , 2018, Artif. Intell. Medicine.

[2]  Ali Dag,et al.  Predicting heart transplantation outcomes through data analytics , 2017, Decis. Support Syst..

[3]  Gediminas Adomavicius,et al.  A Machine Learning Approach to Improving Dynamic Decision Making , 2014, Inf. Syst. Res..

[4]  N. Tangri,et al.  A predictive model for progression of chronic kidney disease to kidney failure. , 2011, JAMA.

[5]  Dursun Delen,et al.  A data analytics approach to building a clinical decision support system for diabetic retinopathy: Developing and deploying a model ensemble , 2017, Decis. Support Syst..

[6]  Ferath Kherif,et al.  Multiple Linear Regression: Bayesian Inference for Distributed and Big Data in the Medical Informatics Platform of the Human Brain Project , 2018 .

[7]  Riccardo Bellazzi,et al.  Machine Learning Methods to Predict Diabetes Complications , 2018, Journal of diabetes science and technology.

[8]  Ruxandra Stoean,et al.  Evolutionary-driven support vector machines for determining the degree of liver fibrosis in chronic hepatitis C , 2011, Artif. Intell. Medicine.

[9]  S. Colan,et al.  Incidence of and risk factors for sudden cardiac death in children with dilated cardiomyopathy: a report from the Pediatric Cardiomyopathy Registry. , 2012, Journal of the American College of Cardiology.

[10]  Farzaneh Ahmadzadeh,et al.  Change point detection with multivariate control charts by artificial neural network , 2018 .

[11]  R. Stevens,et al.  UKPDS 60: Risk of Stroke in Type 2 Diabetes Estimated by the UK Prospective Diabetes Study Risk Engine , 2002, Stroke.

[12]  Yi Yang,et al.  Personal health indexing based on medical examinations: A data mining approach , 2016, Decis. Support Syst..

[13]  José Hernández-Orallo,et al.  An experimental comparison of performance measures for classification , 2009, Pattern Recognit. Lett..

[14]  Pedro Abreu,et al.  Missing data imputation on the 5-year survival prediction of breast cancer patients with unknown discrete values , 2015, Comput. Biol. Medicine.

[15]  S. Brunak,et al.  Mining electronic health records: towards better research applications and clinical care , 2012, Nature Reviews Genetics.

[16]  Kun Zhang,et al.  Lung sounds classification using convolutional neural networks , 2018, Artif. Intell. Medicine.

[17]  Bernd Heinrich,et al.  Assessing data quality - A probability-based metric for semantic consistency , 2018, Decis. Support Syst..

[18]  Zoran Budimac,et al.  An overview of ontologies and data resources in medical domains , 2014, Expert Syst. Appl..

[19]  Sarah Jane Delany k-Nearest Neighbour Classifiers , 2007 .

[20]  Fei Wang,et al.  Deep learning for healthcare: review, opportunities and challenges , 2018, Briefings Bioinform..

[21]  Chih-Jen Tseng,et al.  Integration of data mining classification techniques and ensemble learning to identify risk factors and diagnose ovarian cancer recurrence , 2017, Artif. Intell. Medicine.

[22]  Rama Chellappa,et al.  HyperFace: A Deep Multi-Task Learning Framework for Face Detection, Landmark Localization, Pose Estimation, and Gender Recognition , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  E. Finkelstein,et al.  Development and Validation of a Deep Learning System for Diabetic Retinopathy and Related Eye Diseases Using Retinal Images From Multiethnic Populations With Diabetes , 2017, JAMA.

[24]  C. De Mol,et al.  Forecasting Using a Large Number of Predictors: Is Bayesian Regression a Valid Alternative to Principal Components? , 2006, SSRN Electronic Journal.

[25]  Indranil R. Bardhan,et al.  Predictive Analytics for Readmission of Patients with Congestive Heart Failure , 2015, Inf. Syst. Res..

[26]  Hyunjung Shin,et al.  Robust predictive model for evaluating breast cancer survivability , 2013, Eng. Appl. Artif. Intell..

[27]  Tong Zhang,et al.  A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , 2005, J. Mach. Learn. Res..

[28]  Jianping Fan,et al.  iPrivacy: Image Privacy Protection by Identifying Sensitive Objects via Deep Multi-Task Learning , 2017, IEEE Transactions on Information Forensics and Security.

[29]  Jonathan Baxter,et al.  A Model of Inductive Bias Learning , 2000, J. Artif. Intell. Res..

[30]  Guang-Zhong Yang,et al.  Deep Learning for Health Informatics , 2017, IEEE Journal of Biomedical and Health Informatics.

[31]  Yao Liu,et al.  Using contextual features and multi-view ensemble learning in product defect identification from online discussion forums , 2018, Decis. Support Syst..

[32]  Hsinchun Chen,et al.  Healthcare Predictive Analytics for Risk Profiling in Chronic Care: A Bayesian Multitask Learning Approach , 2017, MIS Q..

[33]  Mary F. McGuire Pancreatic Cancer: Insights from Counterterrorism Theories , 2014, Decis. Anal..

[34]  Dorota Kurowicka,et al.  Generating random correlation matrices based on vines and extended onion method , 2009, J. Multivar. Anal..

[35]  Tom Heskes,et al.  Task Clustering and Gating for Bayesian Multitask Learning , 2003, J. Mach. Learn. Res..

[36]  Oguz Dicle,et al.  Sequential decision tree using the analytic hierarchy process for decision support in rectal cancer , 2012, Artif. Intell. Medicine.

[37]  Dursun Delen,et al.  An analytic approach to better understanding and management of coronary surgeries , 2012, Decis. Support Syst..

[38]  Parisa Rashidi,et al.  Deep EHR: A Survey of Recent Advances in Deep Learning Techniques for Electronic Health Record (EHR) Analysis , 2017, IEEE Journal of Biomedical and Health Informatics.

[39]  Roy Taylor,et al.  Type 2 Diabetes: The Pathologic Basis of Reversible β-Cell Dysfunction , 2016, Diabetes Care.

[40]  D. Bates,et al.  Big data in health care: using analytics to identify and manage high-risk and high-cost patients. , 2014, Health affairs.

[41]  Eren Demir,et al.  A Decision Support Tool for Predicting Patients at Risk of Readmission: A Comparison of Classification Trees, Logistic Regression, Generalized Additive Models, and Multivariate Adaptive Regression Splines , 2014, Decis. Sci..

[42]  Seong Keon Lee,et al.  On generalized multivariate decision tree by using GEE , 2005, Comput. Stat. Data Anal..

[43]  M. G. Pittau,et al.  A weakly informative default prior distribution for logistic and other regression models , 2008, 0901.4011.

[44]  Nagesh Shukla,et al.  Applying a Novel Combination of Techniques to Develop a Predictive Model for Diabetes Complications , 2015, PloS one.

[45]  M. Abràmoff,et al.  Improved Automated Detection of Diabetic Retinopathy on a Publicly Available Dataset Through Integration of Deep Learning. , 2016, Investigative ophthalmology & visual science.

[46]  Sotiris B. Kotsiantis,et al.  Machine learning: a review of classification and combining techniques , 2006, Artificial Intelligence Review.

[47]  Dacheng Tao,et al.  Algorithm-Dependent Generalization Bounds for Multi-Task Learning , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[48]  Chengjun Liu,et al.  Face detection using discriminating feature analysis and Support Vector Machine , 2006, Pattern Recognit..

[49]  Vitaly Schetinin,et al.  Bayesian averaging over Decision Tree models for trauma severity scoring , 2018, Artif. Intell. Medicine.

[50]  Jimeng Sun,et al.  Using recurrent neural network models for early detection of heart failure onset , 2016, J. Am. Medical Informatics Assoc..

[51]  Deyu Zhou,et al.  Position-aware deep multi-task learning for drug-drug interaction extraction , 2018, Artif. Intell. Medicine.

[52]  Faramak Zandi,et al.  A bi-level interactive decision support framework to identify data mining-oriented electronic health record architectures , 2014, Appl. Soft Comput..

[53]  Amir Hassan Zadeh,et al.  Predicting overall survivability in comorbidity of cancers: A data mining approach , 2015, Decis. Support Syst..

[54]  Markus Hagenbuchner,et al.  Breast cancer data analysis for survivability studies and prediction , 2018, Comput. Methods Programs Biomed..

[55]  Pradeep Kumar Ray,et al.  Towards an ontology for data quality in integrated chronic disease management: A realist review of the literature , 2013, Int. J. Medical Informatics.

[56]  Michael Marschollek,et al.  An interoperable clinical decision-support system for early detection of SIRS in pediatric intensive care using openEHR , 2018, Artif. Intell. Medicine.

[57]  Clare Martin,et al.  Special section on artificial intelligence for diabetes , 2018, Artif. Intell. Medicine.

[58]  Dit-Yan Yeung,et al.  A Convex Formulation for Learning Task Relationships in Multi-Task Learning , 2010, UAI.

[59]  S. Rao Jammalamadaka,et al.  Multivariate Bayesian Structural Time Series Model , 2018, J. Mach. Learn. Res..

[60]  Massimiliano Pontil,et al.  Convex multi-task feature learning , 2008, Machine Learning.

[61]  Andrew Zisserman,et al.  Fully‐automated alignment of 3D fetal brain ultrasound to a canonical reference space using multi‐task learning , 2018, Medical Image Anal..

[62]  Lendie Follett,et al.  Achieving Parsimony in Bayesian VARs with the Horseshoe Prior , 2017, 1709.07524.

[63]  Howard Hao-Chun Chuang,et al.  Mathematical modeling and Bayesian estimation for error-prone retail shelf audits , 2015, Decis. Support Syst..

[64]  Andrew Kusiak,et al.  Predicting survival time for kidney dialysis patients: a data mining approach , 2005, Comput. Biol. Medicine.

[65]  Ya Zhang,et al.  A machine learning-based framework to identify type 2 diabetes through electronic health records , 2017, Int. J. Medical Informatics.

[66]  Daoqiang Zhang,et al.  Multi-modal multi-task learning for joint prediction of multiple regression and classification variables in Alzheimer's disease , 2012, NeuroImage.

[67]  Jay Daniel,et al.  Data Completeness in Healthcare: A Literature Survey , 2017, Pac. Asia J. Assoc. Inf. Syst..

[68]  P. Zimmet,et al.  The worldwide epidemiology of type 2 diabetes mellitus—present and future perspectives , 2012, Nature Reviews Endocrinology.

[69]  S. Walczak,et al.  An Evaluation of Artificial Neural Networks in Predicting Pancreatic Cancer Survival , 2017, Journal of Gastrointestinal Surgery.

[70]  Vladimir Kossobokov,et al.  Extreme events: dynamics, statistics and prediction , 2011 .

[71]  B. Maron Hypertrophic cardiomyopathy: a systematic review. , 2002, JAMA.

[72]  Sharon Swee-Lin Tan,et al.  Electronic Health Records: How Can IS Researchers Contribute to Transforming Healthcare? , 2016, MIS Q..

[73]  Murat Sariyar,et al.  Missing values in deduplication of electronic patient data , 2012, J. Am. Medical Informatics Assoc..

[74]  Laura G. Qualls,et al.  Significant Morbidity and Mortality Among Hospitalized End-Stage Liver Disease Patients in Medicare. , 2016, Journal of pain and symptom management.

[75]  Ali Dag,et al.  A probabilistic data-driven framework for scoring the preoperative recipient-donor heart transplant survival , 2016, Decis. Support Syst..

[76]  Ying Li,et al.  Complication Risk Profiling in Diabetes Care: A Bayesian Multi-Task and Feature Relationship Learning Approach , 2020, IEEE Transactions on Knowledge and Data Engineering.

[77]  Ewout W Steyerberg,et al.  Validation and updating of predictive logistic regression models: a study on sample size and shrinkage , 2004, Statistics in medicine.

[78]  Mehmet Tan,et al.  Prediction of anti-cancer drug response by kernelized multi-task learning. , 2016, Artificial intelligence in medicine.

[79]  Marco D. Huesch,et al.  Health Affairs Challenges Implementing Electronic Health Care Predictive Analytics : Considerations And , 2014 .

[80]  Min-Ling Zhang,et al.  A Review on Multi-Label Learning Algorithms , 2014, IEEE Transactions on Knowledge and Data Engineering.

[81]  M. Zweig,et al.  Receiver-operating characteristic (ROC) plots: a fundamental evaluation tool in clinical medicine. , 1993, Clinical chemistry.

[82]  Svetha Venkatesh,et al.  $\mathtt {Deepr}$: A Convolutional Net for Medical Records , 2016, IEEE Journal of Biomedical and Health Informatics.

[83]  Xiaowu Sun,et al.  Using electronic health record data to develop inpatient mortality predictive model: Acute Laboratory Risk of Mortality Score (ALaRMS) , 2013, J. Am. Medical Informatics Assoc..

[84]  Monique Laurent,et al.  Matrices With High Completely Positive Semidefinite Rank , 2016, 1605.00988.

[85]  Tai-Hsi Wu,et al.  Using data mining techniques to predict hospitalization of hemodialysis patients , 2011, Decis. Support Syst..