Implications of non-stationarity on predictive modeling using EHRs

The rapidly increasing volume of clinical information captured in Electronic Health Records (EHRs) has led to the application of increasingly sophisticated models for purposes such as disease subtype discovery and predictive modeling. However, increasing adoption of EHRs implies that in the near future, much of the data available for such purposes will be from a time period during which both the practice of medicine and the clinical use of EHRs are in flux due to historic changes in both technology and incentives. In this work, we explore the implications of this phenomenon, called non-stationarity, on predictive modeling. We focus on the problem of predicting delayed wound healing using data available in the EHR during the first week of care in outpatient wound care centers, using a large dataset covering over 150,000 individual wounds and 59,958 patients seen over a period of four years. We manipulate the degree of non-stationarity seen by the model development process by changing the way data is split into training and test sets. We demonstrate that non-stationarity can lead to quite different conclusions regarding the relative merits of different models with respect to predictive power and calibration of their posterior probabilities. Under the non-stationarity exhibited in this dataset, the performance advantage of complex methods such as stacking relative to the best simple classifier disappears. Ignoring non-stationarity can thus lead to sub-optimal model selection in this task.

[1]  R. Kirsner,et al.  Rapid identification of slow healing wounds , 2016, Wound repair and regeneration : official publication of the Wound Healing Society [and] the European Tissue Repair Society.

[2]  Ian H. Witten,et al.  Issues in Stacked Generalization , 2011, J. Artif. Intell. Res..

[3]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[4]  George Hripcsak,et al.  Using EHRs to integrate research with patient care: promises and challenges , 2012, J. Am. Medical Informatics Assoc..

[5]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[6]  Stephen E. Fienberg,et al.  The Comparison and Evaluation of Forecasters. , 1983 .

[7]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[8]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[9]  D. Blumenthal,et al.  Achieving a Nationwide Learning Health System , 2010, Science Translational Medicine.

[10]  W. Chung,et al.  Defining a comprehensive verotype using electronic health records for personalized medicine. , 2013, Journal of the American Medical Informatics Association : JAMIA.

[11]  Francisco Herrera,et al.  A unifying view on dataset shift in classification , 2012, Pattern Recognit..

[12]  Matthew Mohebbi,et al.  Assessing Google Flu Trends Performance in the United States during the 2009 Influenza Virus A (H1N1) Pandemic , 2011, PloS one.

[13]  Philip R. O. Payne,et al.  Evidence generating medicine: redefining the research-practice relationship to complete the evidence cycle. , 2013, Medical care.

[14]  Suchi Saria,et al.  Developing Predictive Models Using Electronic Medical Records: Challenges and Pitfalls , 2013, AMIA.

[15]  J. Friedman Stochastic gradient boosting , 2002 .

[16]  Greg Ridgeway,et al.  Generalized Boosted Models: A guide to the gbm package , 2006 .

[17]  Nigam H. Shah,et al.  The coming age of data-driven medicine: translational bioinformatics' next frontier , 2012, J. Am. Medical Informatics Assoc..

[18]  George Hripcsak,et al.  Bias Associated with Mining Electronic Health Records , 2011, Journal of biomedical discovery and collaboration.

[19]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[20]  Jimeng Sun,et al.  Limestone: High-throughput candidate phenotype generation via tensor factorization , 2014, J. Biomed. Informatics.

[21]  D. Bates,et al.  Big data in health care: using analytics to identify and manage high-risk and high-cost patients. , 2014, Health affairs.

[22]  I. Jolliffe,et al.  Two Extra Components in the Brier Score Decomposition , 2008 .

[23]  Yoshua Bengio,et al.  Maxout Networks , 2013, ICML.

[24]  George Hripcsak,et al.  Caveats for the use of operational electronic health record data in comparative effectiveness research. , 2013, Medical care.

[25]  Nitesh V. Chawla,et al.  Noname manuscript No. (will be inserted by the editor) Learning from Streaming Data with Concept Drift and Imbalance: An Overview , 2022 .

[26]  George Hripcsak,et al.  The effects of data sources, cohort selection, and outcome definition on a predictive model of risk of thirty-day hospital readmissions , 2014, J. Biomed. Informatics.