A Low-Cost Method for Multiple Disease Prediction

Recently, in response to the rising costs of healthcare services, employers that are financially responsible for the healthcare costs of their workforce have been investing in health improvement programs for their employees. A main objective of these so called "wellness programs" is to reduce the incidence of chronic illnesses such as cardiovascular disease, cancer, diabetes, and obesity, with the goal of reducing future medical costs. The majority of these wellness programs include an annual screening to detect individuals with the highest risk of developing chronic disease. Once these individuals are identified, the company can invest in interventions to reduce the risk of those individuals. However, capturing many biomarkers per employee creates a costly screening procedure. We propose a statistical data-driven method to address this challenge by minimizing the number of biomarkers in the screening procedure while maximizing the predictive power over a broad spectrum of diseases. Our solution uses multi-task learning and group dimensionality reduction from machine learning and statistics. We provide empirical validation of the proposed solution using data from two different electronic medical records systems, with comparisons to a statistical benchmark.

[1]  E. DeLong,et al.  Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. , 1988, Biometrics.

[2]  G. Coffman,et al.  Chronic conditions and risk of in-hospital death. , 1994, Health services research.

[3]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[4]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[5]  Russell S. Kirby,et al.  The Dartmouth Atlas of Health Care , 1998 .

[6]  Rich Caruana,et al.  Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[7]  Kenneth Rockwood,et al.  Comparison of Machine Learning Techniques with Classical Statistical Models in Predicting Health Outcomes , 2004, MedInfo.

[8]  Tong Zhang,et al.  A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , 2005, J. Mach. Learn. Res..

[9]  Massimiliano Pontil,et al.  Multi-Task Feature Learning , 2006, NIPS.

[10]  David S. Wishart,et al.  Applications of Machine Learning in Cancer Prediction and Prognosis , 2006, Cancer informatics.

[11]  Zeeshan Syed,et al.  A Framework for the Analysis of Acoustical Cardiac Signals , 2007, IEEE Transactions on Biomedical Engineering.

[12]  P. Bühlmann,et al.  The group lasso for logistic regression , 2008 .

[13]  Yu Cao,et al.  An integrated machine learning approach to stroke prediction , 2010, KDD.

[14]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[15]  D. Koller,et al.  Integration of Early Physiological Responses Predicts Later Illness Severity in Preterm Infants , 2010, Science Translational Medicine.

[16]  Jenna Wiens,et al.  Patient Risk Stratification for Hospital-Associated C. diff as a Time-Series Classification Task , 2012, NIPS.

[17]  A Comparison of Dimensionality Reduction Techniques for Unstructured Clinical Text , 2012 .

[18]  Sriraam Natarajan,et al.  Machine Learning for Personalized Medicine: Predicting Primary Myocardial Infarction from Electronic Health Records , 2012, AI Mag..

[19]  Mark Braverman,et al.  Data-Driven Decisions for Reducing Readmissions for Heart Failure: General Methodology and Case Study , 2014, PloS one.

[20]  Fei Wang,et al.  Exploring Joint Disease Risk Prediction , 2014, AMIA.

[21]  D. Bates,et al.  Big data in health care: using analytics to identify and manage high-risk and high-cost patients. , 2014, Health affairs.