Simultaneous Modeling of Multiple Complications for Risk Profiling in Diabetes Care

Type 2 diabetes mellitus (T2DM) is a chronic disease that o‰en results in multiple complications. Risk prediction and pro€ling of T2DM complications is critical for healthcare professionals to design personalized treatment plans for patients in diabetes care for improved outcomes. In this paper, we study the risk of developing complications a‰er the initial T2DM diagnosis from longitudinal patient records. We propose a novel multi-task learning approach to simultaneously model multiple complications where each task corresponds to the risk modeling of one complication. Speci€cally, the proposed method strategically captures the relationships (1) between the risks of multiple T2DM complications, (2) between the di‚erent risk factors, and (3) between the risk factor selection paŠerns. Œe method uses coecient shrinkage to identify an informative subset of risk factors from high-dimensional data, and uses a hierarchical Bayesian framework to allow domain knowledge to be incorporated as priors. Œe proposed method is favorable for healthcare applications because in additional to improved prediction performance, relationships among the di‚erent risks and risk factors are also identi€ed. Extensive experimental results on a large electronic medical claims database show that the proposed method outperforms state-of-the-art models by a signi€cant margin. Furthermore, we show that the risk associations learned and the risk factors identi€ed lead to meaningful clinical insights. CCS CONCEPTS •Information systems→ Data mining; •Applied computing → Health informatics;

[1]  Kenney Ng,et al.  Early Detection of Heart Failure Using Electronic Health Records: Practical Implications for Time Before Diagnosis, Data Diversity, Data Quantity, and Data Density , 2017 .

[2]  Pedro J. Caraballo,et al.  Type 2 Diabetes Mellitus Trajectories and Associated Risks , 2016, Big Data.

[3]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[4]  Luc Devroye,et al.  Sample-based non-uniform random variate generation , 1986, WSC '86.

[5]  Massimiliano Pontil,et al.  Multi-Task Feature Learning , 2006, NIPS.

[6]  Ping Zhang,et al.  Risk Prediction with Electronic Health Records: A Deep Learning Approach , 2016, SDM.

[7]  A. Barabasi,et al.  The human disease network , 2007, Proceedings of the National Academy of Sciences.

[8]  Rich Caruana,et al.  Multitask Learning , 1997, Machine-mediated learning.

[9]  Jiayu Zhou,et al.  A multi-task learning formulation for predicting disease progression , 2011, KDD.

[10]  Jimeng Sun,et al.  Using recurrent neural network models for early detection of heart failure onset , 2016, J. Am. Medical Informatics Assoc..

[11]  Xiang Wang,et al.  Unsupervised learning of disease progression models , 2014, KDD.

[12]  Dit-Yan Yeung,et al.  A Convex Formulation for Learning Task Relationships in Multi-Task Learning , 2010, UAI.

[13]  Ramiro Guerrero Carvajal Worldwide trends in diabetes since 1980: a pooled analysis of 751 population-based studies with 4.4 million participants , 2016 .

[14]  Yu Zhang,et al.  A Survey on Multi-Task Learning , 2017, IEEE Transactions on Knowledge and Data Engineering.

[15]  Han Zhao,et al.  Efficient Multitask Feature and Relationship Learning , 2017, UAI.

[16]  Susan Hutfless,et al.  Mining high-dimensional administrative claims data to predict early hospital readmissions , 2014, J. Am. Medical Informatics Assoc..

[17]  Fei Wang,et al.  Towards actionable risk stratification: A bilinear approach , 2015, J. Biomed. Informatics.

[18]  Massimiliano Pontil,et al.  Convex multi-task feature learning , 2008, Machine Learning.

[19]  Hisashi Kashima,et al.  Simultaneous Modeling of Multiple Diseases for Mortality Prediction in Acute Hospital Care , 2015, KDD.

[20]  György J. Simon,et al.  TR 15-016 Mining Electronic Health Records ( EHR ) : A Survey , 2015 .

[21]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[22]  Jenna Wiens,et al.  Patient Risk Stratification with Time-Varying Parameters: A Multitask Learning Approach , 2016, J. Mach. Learn. Res..

[23]  Indranil R. Bardhan,et al.  Predictive Analytics for Readmission of Patients with Congestive Heart Failure , 2015, Inf. Syst. Res..

[24]  M. Cooper,et al.  Mechanisms of diabetic complications. , 2013, Physiological reviews.

[25]  Aesha Drozdowski,et al.  Standards of medical care in diabetes. , 2004, Diabetes care.

[26]  Scott T. Weiss,et al.  Prediction of chronic obstructive pulmonary disease (COPD) in asthma patients using electronic medical records. , 2009, Journal of the American Medical Informatics Association : JAMIA.

[27]  Fei Wang,et al.  LINKAGE: An Approach for Comprehensive Risk Prediction for Care Management , 2015, KDD.

[28]  Xiaowu Sun,et al.  Using electronic health record data to develop inpatient mortality predictive model: Acute Laboratory Risk of Mortality Score (ALaRMS) , 2013, J. Am. Medical Informatics Assoc..

[29]  David A. Sontag,et al.  Population-Level Prediction of Type 2 Diabetes From Claims Data and Analysis of Risk Factors , 2015, Big Data.

[30]  Dit-Yan Yeung,et al.  A Regularization Approach to Learning Task Relationships in Multitask Learning , 2014, ACM Trans. Knowl. Discov. Data.

[31]  Mingyao Li,et al.  Joint Regression Analysis of Correlated Data Using Gaussian Copulas , 2009, Biometrics.

[32]  Ying Li,et al.  Early Prediction of Diabetes Complications from Electronic Health Records: A Multi-Task Survival Analysis Approach , 2018, AAAI.