Large-Scale Study of Temporal Shift in Health Insurance Claims

Most machine learning models for predicting clinical outcomes are developed using historical data. Yet, even if these models are deployed in the near future, dataset shift over time may result in less than ideal performance. To capture this phenomenon, we consider a task--that is, an outcome to be predicted at a particular time point--to be non-stationary if a historical model is no longer optimal for predicting that outcome. We build an algorithm to test for temporal shift either at the population level or within a discovered sub-population. Then, we construct a meta-algorithm to perform a retrospective scan for temporal shift on a large collection of tasks. Our algorithms enable us to perform the first comprehensive evaluation of temporal shift in healthcare to our knowledge. We create 1,010 tasks by evaluating 242 healthcare outcomes for temporal shift from 2015 to 2020 on a health insurance claims dataset. 9.7% of the tasks show temporal shifts at the population level, and 93.0% have some sub-population affected by shifts. We dive into case studies to understand the clinical implications. Our analysis highlights the widespread prevalence of temporal shifts in healthcare.

[1]  Pang Wei Koh,et al.  Wild-Time: A Benchmark of in-the-Wild Distribution Shift over Time , 2022, NeurIPS.

[2]  Zachary Chase Lipton,et al.  Model Evaluation in Medical Datasets Over Time , 2022, ArXiv.

[3]  Zachary Chase Lipton,et al.  Domain Adaptation under Missingness Shift , 2022, AISTATS.

[4]  Yixuan Li,et al.  OpenOOD: Benchmarking Generalized Out-of-Distribution Detection , 2022, NeurIPS.

[5]  A. Bui,et al.  AdaDiag: Adversarial Domain Adaptation of Diagnostic Prediction with Clinical Event Sequences , 2022, J. Biomed. Informatics.

[6]  S. Pfohl,et al.  EHR foundation models improve robustness in the presence of temporal distribution shift , 2022, Scientific Reports.

[7]  Ali Taylan Cemgil,et al.  A Fine-Grained Analysis on Distribution Shift , 2021, ICLR.

[8]  C. Scheidel,et al.  Cancer Screening During the COVID-19 Pandemic: A Systematic Review and Meta-analysis. , 2021, Mayo Clinic Proceedings: Innovations, Quality & Outcomes.

[9]  Mitesh S. Patel,et al.  Statin Prescribing Patterns During In-Person and Telemedicine Visits Before and During the COVID-19 Pandemic. , 2021, Circulation. Cardiovascular quality and outcomes.

[10]  Joseph Paul Cohen,et al.  Problems in the deployment of machine-learned models in health care , 2021, Canadian Medical Association Journal.

[11]  S. Pfohl,et al.  Systematic Review of Approaches to Preserve Machine Learning Performance in the Presence of Temporal Dataset Shift in Clinical Medicine , 2021, Applied Clinical Informatics.

[12]  Jenna Wiens,et al.  Mind the Performance Gap: Examining Dataset Shift During Prospective Validation , 2021, MLHC.

[13]  A. Pursnani,et al.  Effect Of The Covid-19 Pandemic On Coronary Artery Calcium Testing And Subsequent Statin Prescription , 2021, Journal of Cardiovascular Computed Tomography.

[14]  S. Saria,et al.  The Clinician and Dataset Shift in Artificial Intelligence. , 2021, The New England journal of medicine.

[15]  Thomas Brox,et al.  Test-Time Adaptation to Distribution Shift by Confidence Maximization and Input Transformation , 2021, ArXiv.

[16]  Alistair E. W. Johnson,et al.  Evaluation of domain generalization and adaptation on improving model robustness to temporal dataset shift in clinical medicine , 2021, Scientific Reports.

[17]  P. McDonnell,et al.  Changes in patient visits and diagnoses in a large academic center during the COVID-19 pandemic , 2021, BMC Ophthalmology.

[18]  Danica J. Sutherland,et al.  Does Invariant Risk Minimization Capture Invariance? , 2021, AISTATS.

[19]  Pang Wei Koh,et al.  WILDS: A Benchmark of in-the-Wild Distribution Shifts , 2020, ICML.

[20]  G. Koren,et al.  Machine learning algorithm for early detection of end-stage renal disease , 2020, BMC Nephrology.

[21]  A. Harris,et al.  Trends in the Use of Telehealth During the Emergence of the COVID-19 Pandemic — United States, January–March 2020 , 2020, MMWR. Morbidity and mortality weekly report.

[22]  Pradeep Ravikumar,et al.  The Risks of Invariant Risk Minimization , 2020, ICLR.

[23]  Aleksander Madry,et al.  BREEDS: Benchmarks for Subpopulation Shift , 2020, ICLR.

[24]  Dacheng Tao,et al.  LTF: A Label Transformation Framework for Correcting Label Shift , 2020, ICML.

[25]  Percy Liang,et al.  Robustness to Spurious Correlations via Human Annotations , 2020, ICML.

[26]  David Sontag,et al.  Deep Contextual Clinical Prediction with Reverse Distillation , 2020, AAAI.

[27]  Sergey Levine,et al.  Adaptive Risk Minimization: A Meta-Learning Approach for Tackling Group Shift , 2020, ArXiv.

[28]  M. Luštrek,et al.  Machine Learning Prediction Models for Chronic Kidney Disease Using National Health Insurance Claim Data in Taiwan , 2020, medRxiv.

[29]  M. Coletta,et al.  Impact of the COVID-19 Pandemic on Emergency Department Visits — United States, January 1, 2019–May 30, 2020 , 2020, MMWR. Morbidity and mortality weekly report.

[30]  Colin Wei,et al.  Self-training Avoids Using Spurious Features Under Domain Shift , 2020, NeurIPS.

[31]  V. Mor,et al.  Nursing Home Care in Crisis in the Wake of COVID-19. , 2020, JAMA.

[32]  C. Saez,et al.  EHRtemporalVariability: delineating temporal data-set shifts in electronic health records , 2020, medRxiv.

[33]  M. Chung,et al.  Portable chest X-ray in coronavirus disease-19 (COVID-19): A pictorial review , 2020, Clinical Imaging.

[34]  Tatsunori B. Hashimoto,et al.  Distributionally Robust Neural Networks for Group Shifts: On the Importance of Regularization for Worst-Case Generalization , 2019, ArXiv.

[35]  Suchi Saria,et al.  From development to deployment: dataset shift, causality, and shift-stable models in health AI. , 2019, Biostatistics.

[36]  David C. Kale,et al.  Do no harm: a roadmap for responsible machine learning for health care , 2019, Nature Medicine.

[37]  Anna Goldenberg,et al.  Feature Robustness in Non-stationary Health Records: Caveats to Deployable Model Performance in Common Clinical Machine Learning Tasks , 2019, MLHC.

[38]  David Lopez-Paz,et al.  Invariant Risk Minimization , 2019, ArXiv.

[39]  Michael Gao,et al.  Real-World Integration of a Sepsis Deep Learning Technology Into Routine Clinical Care: Implementation Study , 2019, JMIR medical informatics.

[40]  Alexander J. Smola,et al.  Detecting and Correcting for Label Shift with Black Box Predictors , 2018, ICML.

[41]  Kevin Gimpel,et al.  A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks , 2016, ICLR.

[42]  David A. Sontag,et al.  Population-Level Prediction of Type 2 Diabetes From Claims Data and Analysis of Risk Factors , 2015, Big Data.

[43]  Nigam H. Shah,et al.  Implications of non-stationarity on predictive modeling using EHRs , 2015, J. Biomed. Informatics.

[44]  Kate Saenko,et al.  Return of Frustratingly Easy Domain Adaptation , 2015, AAAI.

[45]  Yaning Liu,et al.  Online Anomaly Detection in Wireless Body Area Networks for Reliable Healthcare Monitoring , 2014, IEEE Journal of Biomedical and Health Informatics.

[46]  Bernhard Schölkopf,et al.  Domain Adaptation under Target and Conditional Shift , 2013, ICML.

[47]  Scott A. Sisson,et al.  Detection of non-stationarity in precipitation extremes using a max-stable process model , 2011 .

[48]  Semyon Slobounov,et al.  Application of a novel measure of EEG non-stationarity as ‘Shannon- entropy of the peak frequency shifting’ for detecting residual abnormalities in concussed individuals , 2011, Clinical Neurophysiology.

[49]  Xavier Robin,et al.  pROC: an open-source package for R and S+ to analyze and compare ROC curves , 2011, BMC Bioinformatics.

[50]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[51]  G. Smyth,et al.  Statistical Applications in Genetics and Molecular Biology Permutation P -values Should Never Be Zero: Calculating Exact P -values When Permutations Are Randomly Drawn , 2011 .

[52]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[53]  C. Schmid,et al.  A new equation to estimate glomerular filtration rate. , 2009, Annals of internal medicine.

[54]  Neil D. Lawrence,et al.  Dataset Shift in Machine Learning , 2009 .

[55]  Eleni I. Vlahogianni,et al.  Statistical methods for detecting nonlinearity and non-stationarity in univariate short-term time-series of traffic volume , 2006 .

[56]  Tom Greene,et al.  Using Standardized Serum Creatinine Values in the Modification of Diet in Renal Disease Study Equation for Estimating Glomerular Filtration Rate , 2006, Annals of Internal Medicine.

[57]  David Gur,et al.  A permutation test sensitive to differences in areas for comparing ROC curves from a paired design , 2005, Statistics in medicine.

[58]  Kenji Yamanishi,et al.  A unifying framework for detecting outliers and change points from non-stationary time series data , 2002, KDD.

[59]  Balakrishnan Narayanaswamy,et al.  FITNESS: (Fine Tune on New and Similar Samples) to detect anomalies in streams with drift and outliers , 2022, ICML.

[60]  Suchi Saria,et al.  Evaluating Model Robustness and Stability to Dataset Shift , 2021, AISTATS.

[61]  Trevor Darrell,et al.  Tent: Fully Test-Time Adaptation by Entropy Minimization , 2021, ICLR.

[62]  Peter A. Flach,et al.  Dataset Shift Detection with Model-Based Subgroup Discovery , 2015 .

[63]  Yu-Chuan Li,et al.  Observational Health Data Sciences and Informatics (OHDSI): Opportunities for Observational Researchers , 2015, MedInfo.

[64]  Brenda R. Hemmelgarn,et al.  Notice , 2012, Kidney International Supplements.

[65]  Skipper Seabold,et al.  Statsmodels: Econometric and Statistical Modeling with Python , 2010, SciPy.

[66]  Guy P. Nason,et al.  Stationary and non-stationary time series , 2006 .

[67]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .