Precision Health–Enabled Machine Learning to Identify Need for Wraparound Social Services Using Patient- and Population-Level Data Sets: Algorithm Development and Validation

Background Emerging interest in precision health and the increasing availability of patient- and population-level data sets present considerable potential to enable analytical approaches to identify and mitigate the negative effects of social factors on health. These issues are not satisfactorily addressed in typical medical care encounters, and thus, opportunities to improve health outcomes, reduce costs, and improve coordination of care are not realized. Furthermore, methodological expertise on the use of varied patient- and population-level data sets and machine learning to predict need for supplemental services is limited. Objective The objective of this study was to leverage a comprehensive range of clinical, behavioral, social risk, and social determinants of health factors in order to develop decision models capable of identifying patients in need of various wraparound social services. Methods We used comprehensive patient- and population-level data sets to build decision models capable of predicting need for behavioral health, dietitian, social work, or other social service referrals within a safety-net health system using area under the receiver operating characteristic curve (AUROC), sensitivity, precision, F1 score, and specificity. We also evaluated the value of population-level social determinants of health data sets in improving machine learning performance of the models. Results Decision models for each wraparound service demonstrated performance measures ranging between 59.2%% and 99.3%. These results were statistically superior to the performance measures demonstrated by our previous models which used a limited data set and whose performance measures ranged from 38.2% to 88.3% (behavioural health: F1 score P<.001, AUROC P=.01; social work: F1 score P<.001, AUROC P=.03; dietitian: F1 score P=.001, AUROC P=.001; other: F1 score P=.01, AUROC P=.02); however, inclusion of additional population-level social determinants of health did not contribute to any performance improvements (behavioural health: F1 score P=.08, AUROC P=.09; social work: F1 score P=.16, AUROC P=.09; dietitian: F1 score P=.08, AUROC P=.14; other: F1 score P=.33, AUROC P=.21) in predicting the need for referral in our population of vulnerable patients seeking care at a safety-net provider. Conclusions Precision health–enabled decision models that leverage a wide range of patient- and population-level data sets and advanced machine learning methods are capable of predicting need for various wraparound social services with good performance.

[1]  L. Gottlieb,et al.  Meanings and Misunderstandings: A Social Determinants of Health Lexicon for Health Care Systems. , 2019, The Milbank quarterly.

[2]  Shaun J. Grannis,et al.  Impact of Risk Stratification on Referrals and Uptake of Wraparound Services That Address Social Determinants: A Stepped Wedged Trial. , 2019, American journal of preventive medicine.

[3]  Paul G. Biondich,et al.  Development of a FHIR Based Application Programming Interface for Aggregate-Level Social Determinants of Health , 2019 .

[4]  Elizabeth H. Golembiewski,et al.  Combining Nonclinical Determinants of Health and Clinical Data for Research and Evaluation: Rapid Review , 2018, JMIR public health and surveillance.

[5]  Dawn P. Haut,et al.  Indianapolis Provider's Use Of Wraparound Services Associated With Reduced Hospitalizations And Emergency Department Visits. , 2018, Health affairs.

[6]  Rachel Gold,et al.  Adoption of Social Determinants of Health EHR Tools by Community Health Centers , 2018, The Annals of Family Medicine.

[7]  Judea Pearl,et al.  Theoretical Impediments to Machine Learning With Seven Sparks from the Causal Revolution , 2018, WSDM.

[8]  Shaun J. Grannis,et al.  Assessing the capacity of social determinants of health data to augment predictive models identifying patients in need of wraparound social services , 2018, J. Am. Medical Informatics Assoc..

[9]  Terry Anthony Byrd,et al.  Big data analytics: Understanding its capabilities and potential benefits for healthcare organizations , 2018 .

[10]  David Zidar,et al.  Accuracy of Cardiovascular Risk Prediction Varies by Neighborhood Socioeconomic Position , 2017, Annals of Internal Medicine.

[11]  M. Brucker Social Determinants of Health. , 2017, Nursing for women's health.

[12]  Joy H. Lewis,et al.  Community health center provider ability to identify, treat and account for the social determinants of health: a card study , 2016, BMC Family Practice.

[13]  Ara Darzi,et al.  Patient Segmentation Analysis Offers Significant Benefits For Integrated Care And Support. , 2016, Health affairs.

[14]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[15]  A. Kaufman Theory vs Practice: Should Primary Care Practice Take on Social Determinants of Health Now? Yes. , 2016, The Annals of Family Medicine.

[16]  Paul Roman,et al.  The Relationship Between Client Characteristics and Wraparound Services in Substance Use Disorder Treatment Centers. , 2016, Journal of studies on alcohol and drugs.

[17]  Walter F. Stewart,et al.  Doctor AI: Predicting Clinical Events via Recurrent Neural Networks , 2015, MLHC.

[18]  Didrik Nielsen,et al.  Tree Boosting With XGBoost - Why Does XGBoost Win "Every" Machine Learning Competition? , 2016 .

[19]  Elizabeth A. Bayliss,et al.  Primary Care Physician Insights Into a Typology of the Complex Patient in Primary Care , 2015, The Annals of Family Medicine.

[20]  Walter P Wodchis,et al.  Looking Beyond Income and Education: Socioeconomic Status Gradients Among Future High-Cost Users of Health Care. , 2015, American journal of preventive medicine.

[21]  Gilles Louppe,et al.  Scikit-learn: Machine Learning Without Learning the Machinery , 2015, GETMBL.

[22]  F. Collins,et al.  A new initiative on precision medicine. , 2015, The New England journal of medicine.

[23]  Behavioral Domains,et al.  Capturing Social and Behavioral Domains and Measures in Electronic Health Records: Phase 2 , 2015 .

[24]  Harry Heiman,et al.  Beyond Health Care: The Role of Social Determinants in Promoting Health and Health Equity , 2015 .

[25]  Sebastian Calonico,et al.  Robust Nonparametric Confidence Intervals for Regression‐Discontinuity Designs , 2014 .

[26]  Charu C. Aggarwal,et al.  Feature Selection for Classification: A Review , 2014, Data Classification: Algorithms and Applications.

[27]  Kathleen Gray,et al.  Exposome informatics: considerations for the design of future biomedical research information systems , 2014, J. Am. Medical Informatics Assoc..

[28]  Linas Simonaitis,et al.  Regenstrief Institute's Medical Gopher: A next-generation homegrown electronic medical record system , 2014, Int. J. Medical Informatics.

[29]  Huan Liu,et al.  Feature Selection for Classification: A Review , 2014, Data Classification: Algorithms and Applications.

[30]  G. Guyatt,et al.  Risk Prediction Models for Mortality in Ambulatory Patients With Heart Failure: A Systematic Review , 2013, Circulation. Heart failure.

[31]  Mika Kivimäki,et al.  Risk Models to Predict Hypertension: A Systematic Review , 2013, PloS one.

[32]  R. Goodman,et al.  Defining and Measuring Chronic Conditions: Imperatives for Research, Policy, Program, and Practice , 2013, Preventing chronic disease.

[33]  T. Murdoch,et al.  The inevitable application of big data to health care. , 2013, JAMA.

[34]  Trupti M. Kodinariya,et al.  Review on determining number of Cluster in K-Means Clustering , 2013 .

[35]  Amanda H. Salanitro,et al.  Risk prediction models for hospital readmission: a systematic review. , 2011, JAMA.

[36]  Douglas G Altman,et al.  How to obtain the P value from a confidence interval , 2011, BMJ : British Medical Journal.

[37]  Stuart J. Nelson,et al.  Normalized names for clinical drugs: RxNorm at 6 years , 2011, J. Am. Medical Informatics Assoc..

[38]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[39]  Anil K. Jain Data clustering: 50 years beyond K-means , 2008, Pattern Recognit. Lett..

[40]  Zhaohui Zheng,et al.  Stochastic gradient boosted distributed decision trees , 2009, CIKM.

[41]  Miha Vuk,et al.  ROC curve, lift chart and calibration plot , 2006, Advances in Methodology and Statistics.

[42]  Lonnie Blevins,et al.  The Indiana network for patient care: a working local health information infrastructure. An example of a working infrastructure collaboration that links data from five health systems and hundreds of millions of entries. , 2005, Health affairs.

[43]  R. Donaldson,et al.  Acquired abnormalities of the tricuspid valve--an ultrasonographic study. , 1987, International Journal of Cardiology.

[44]  W. Youden,et al.  Index for rating diagnostic tests , 1950, Cancer.

[45]  C. Lingard,et al.  Book Review: The Challenge of Red China , 1946 .