Prediction and Inference With Missing Data in Patient Alert Systems

Abstract We describe the Bedside Patient Rescue (BPR) project, the goal of which is risk prediction of adverse events for non-intensive care unit patients using ∼100 variables (vitals, lab results, assessments, etc.). There are several missing predictor values for most patients, which in the health sciences is the norm, rather than the exception. A Bayesian approach is presented that addresses many of the shortcomings to standard approaches to missing predictors: (i) treatment of the uncertainty due to imputation is straight-forward in the Bayesian paradigm, (ii) the predictor distribution is flexibly modeled as an infinite normal mixture with latent variables to explicitly account for discrete predictors (i.e., as in multivariate probit regression models), and (iii) certain missing not at random situations can be handled effectively by allowing the indicator of missingness into the predictor distribution only to inform the distribution of the missing variables. The proposed approach also has the benefit of providing a distribution for the prediction, including the uncertainty inherent in the imputation. Therefore, we can ask questions such as: is it possible this individual is at high risk but we are missing too much information to know for sure? How much would we reduce the uncertainty in our risk prediction by obtaining a particular missing value? This approach is applied to the BPR problem resulting in excellent predictive capability to identify deteriorating patients. Supplementary materials for this article, including a standardized description of the materials available for reproducing the work, are available as an online supplement.

[1]  Michael J Daniels,et al.  Bayesian nonparametric generative models for causal inference with missing at random covariates , 2017, Biometrics.

[2]  David E. Booth,et al.  Analysis of Incomplete Multivariate Data , 2000, Technometrics.

[3]  Jon C. Helton,et al.  Analysis of computationally demanding models with continuous and categorical inputs , 2013, Reliab. Eng. Syst. Saf..

[4]  P. Müller,et al.  Bayesian curve fitting using multivariate normal mixtures , 1996 .

[5]  D. Rubin,et al.  Fully conditional specification in multivariate imputation , 2006 .

[6]  J. Ornato,et al.  First documented rhythm and clinical outcome from in-hospital cardiac arrest among children and adults. , 2006, JAMA.

[7]  N. Yi,et al.  Bayesian mapping of quantitative trait loci for complex binary traits. , 2000, Genetics.

[8]  K. Seaton,et al.  Rounding non-binary categorical variables following multivariate normal imputation: evaluation of simple methods and implications for practice , 2014 .

[9]  A. Doucet,et al.  Efficient Bayesian Inference for Multivariate Probit Models With Sparse Inverse Correlation Matrices , 2012 .

[10]  Mulugeta Gebregziabher,et al.  Latent class based multiple imputation approach for missing categorical data. , 2010, Journal of statistical planning and inference.

[11]  Daniel Neuhoff,et al.  Reversible Jump Markov Chain Monte Carlo , 2016 .

[12]  A. Raftery,et al.  Variable Selection for Model-Based Clustering , 2006 .

[13]  J. Sethuraman A CONSTRUCTIVE DEFINITION OF DIRICHLET PRIORS , 1991 .

[14]  B. Efron The Efficiency of Cox's Likelihood Function for Censored Data , 1977 .

[15]  Mauro Fabrizio,et al.  Existence and Uniqueness , 2021, Thermodynamics of Materials with Memory.

[16]  Gary B. Smith,et al.  ViEWS--Towards a national early warning score for detecting adult inpatient deterioration. , 2010, Resuscitation.

[17]  John K. Kruschke,et al.  Doing Bayesian Data Analysis: A Tutorial with R, JAGS, and Stan , 2014 .

[18]  P. Green,et al.  Decomposable graphical Gaussian model determination , 1999 .

[19]  T. Amemiya Tobit models: A survey , 1984 .

[20]  Robert D. Gibbons,et al.  Health Service Utilization and Insurance Coverage: A Multivariate Probit Analysis , 1998 .

[21]  D. Dunson,et al.  Simplex Factor Models for Multivariate Unordered Categorical Data , 2012, Journal of the American Statistical Association.

[22]  John Van Hoewyk,et al.  A multivariate technique for multiply imputing missing values using a sequence of regression models , 2001 .

[23]  Dandan Xu,et al.  Sequential BART for imputation of missing covariates. , 2016, Biostatistics.

[24]  Xiao Zhang,et al.  Bayesian analysis of multivariate nominal measures using multivariate multinomial probit models , 2008, Comput. Stat. Data Anal..

[25]  D. Dunson,et al.  Bayesian latent variable models for clustered mixed outcomes , 2000 .

[26]  Peter E. Rossi,et al.  An exact likelihood analysis of the multinomial probit model , 1994 .

[27]  David B. Dunson,et al.  Improving prediction from dirichlet process mixtures via enrichment , 2014, J. Mach. Learn. Res..

[28]  D. Dunson,et al.  Bayesian multivariate mixed-scale density estimation , 2011, 1110.1265.

[29]  Peter Bühlmann,et al.  MissForest - non-parametric missing value imputation for mixed-type data , 2011, Bioinform..

[30]  D. Dunson,et al.  Kernel stick-breaking processes. , 2008, Biometrika.

[31]  Joseph L Schafer,et al.  Robustness of a multivariate normal approximation for imputation of incomplete binary data , 2007, Statistics in medicine.

[32]  Peter E. Rossi,et al.  A Bayesian analysis of the multinomial probit model with fully identified parameters , 2000 .

[33]  susan. carter,et al.  Much Ado About Nothing ? , 2015 .

[34]  P. Green Reversible jump Markov chain Monte Carlo computation and Bayesian model determination , 1995 .

[35]  B. Waxman,et al.  Recognising clinical instability in hospital patients before cardiac arrest or unplanned admission to intensive care: A pilot study in a tertiary‐care hospital , 1999, The Medical journal of Australia.

[36]  Nicholas J. Horton,et al.  A Potential for Bias When Rounding in Multiple Imputation , 2003 .

[37]  Agostino Nobile,et al.  A hybrid Markov chain for the Bayesian analysis of the multinomial probit model , 1998, Stat. Comput..

[38]  R. Kohn,et al.  Efficient estimation of covariance selection models , 2003 .

[39]  Marina Vannucci,et al.  Variable selection in clustering via Dirichlet process mixture models , 2006 .

[40]  Babak Shahbaba,et al.  Nonlinear Models Using Dirichlet Process Mixtures , 2007, J. Mach. Learn. Res..

[41]  Gary B. Smith,et al.  The ability of the National Early Warning Score (NEWS) to discriminate patients at risk of early cardiac arrest, unanticipated intensive care unit admission, and death. , 2013, Resuscitation.

[42]  Santiago Romero-Brufau,et al.  Widely used track and trigger scores: are they ready for automation in practice? , 2014, Resuscitation.

[43]  Alexander Hehmeyer,et al.  Nonparametric Bayesian Multiple Imputation for Incomplete Categorical Variables in Large-Scale Assessment Surveys , 2013 .

[44]  J. Berger Statistical Decision Theory and Bayesian Analysis , 1988 .

[45]  S. van Buuren Multiple imputation of discrete and continuous data by fully conditional specification , 2007, Statistical methods in medical research.

[46]  Jerome P. Reiter,et al.  Multiple Imputation of Missing Categorical and Continuous Values via Bayesian Mixture Models With Local Dependence , 2014, 1410.0438.

[47]  Joseph G. Ibrahim,et al.  A conditional model for incomplete covariates in parametric regression models , 1996 .

[48]  M. Kenward,et al.  Every missingness not at random model has a missingness at random counterpart with equal fit , 2008 .

[49]  G Molenberghs,et al.  Multivariate probit analysis: a neglected procedure in medical statistics. , 1991, Statistics in medicine.

[50]  J. Benson,et al.  A Clinical Deterioration Prediction Tool for Internal Medicine Patients , 2013, American journal of medical quality : the official journal of the American College of Medical Quality.

[51]  Lancelot F. James,et al.  Gibbs Sampling Methods for Stick-Breaking Priors , 2001 .

[52]  S. Chib,et al.  Analysis of multivariate probit models , 1998 .

[53]  T. Ferguson A Bayesian Analysis of Some Nonparametric Problems , 1973 .

[54]  J. Griffiths,et al.  Current use of early warning scores in UK emergency departments , 2011, Emergency Medicine Journal.

[55]  Yaming Yu,et al.  Imputing Missing Data by Fully Conditional Models : Some Cautionary Examples and Guidelines , 2012 .

[56]  C. Sprung,et al.  Clinical antecedents to in-hospital cardiopulmonary arrest. , 1990, Chest.

[57]  W. Holmes FinchMaria E. Hernández Finch Imputation Methods for Missing Categorical Questionnaire Data: A Comparison of Approaches , 2021, Journal of Data Science.

[58]  D. V. Dyk,et al.  A Bayesian analysis of the multinomial probit model using marginal data augmentation , 2005 .

[59]  van der Ark,et al.  9. Multiple Imputation of Incomplete Categorical Data Using Latent Class Analysis , 2008 .

[60]  R. Paterson,et al.  Prediction of in-hospital mortality and length of stay using an early warning scoring system: clinical audit. , 2006, Clinical medicine.

[61]  J. Heckman The Common Structure of Statistical Models of Truncation, Sample Selection and Limited Dependent Variables and a Simple Estimator for Such Models , 1976 .

[62]  J. Ornato,et al.  Cardiopulmonary resuscitation of adults in the hospital: a report of 14720 cardiac arrests from the National Registry of Cardiopulmonary Resuscitation. , 2003, Resuscitation.

[63]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[64]  Jon C. Helton,et al.  Survey of sampling-based methods for uncertainty and sensitivity analysis , 2006, Reliab. Eng. Syst. Saf..

[65]  Yulei He,et al.  Gaussian-based routines to impute categorical variables in health surveys. , 2011, Statistics in medicine.

[66]  Emmanuel Lesaffre,et al.  Existence and Uniqueness of the Maximum Likelihood Estimator for a Multivariate Probit Model , 1992 .

[67]  L. Forni,et al.  Worthing physiological scoring system: derivation and validation of a physiological early-warning system for medical admissions. An observational, population-based single-centre study. , 2007, British journal of anaesthesia.

[68]  D. Rubin INFERENCE AND MISSING DATA , 1975 .

[69]  Heather Quinn,et al.  A Bayesian Reliability Analysis of Neutron-Induced Errors in High Performance Computing Hardware , 2013 .

[70]  A. Agresti,et al.  A Correlated Probit Model for Joint Modeling of Clustered Binary and Continuous Responses , 2001 .

[71]  P. Green,et al.  Reversible jump MCMC , 2009 .