Prediction of In-hospital Mortality in Emergency Department Patients With Sepsis: A Local Big Data-Driven, Machine Learning Approach.

OBJECTIVES Predictive analytics in emergency care has mostly been limited to the use of clinical decision rules (CDRs) in the form of simple heuristics and scoring systems. In the development of CDRs, limitations in analytic methods and concerns with usability have generally constrained models to a preselected small set of variables judged to be clinically relevant and to rules that are easily calculated. Furthermore, CDRs frequently suffer from questions of generalizability, take years to develop, and lack the ability to be updated as new information becomes available. Newer analytic and machine learning techniques capable of harnessing the large number of variables that are already available through electronic health records (EHRs) may better predict patient outcomes and facilitate automation and deployment within clinical decision support systems. In this proof-of-concept study, a local, big data-driven, machine learning approach is compared to existing CDRs and traditional analytic methods using the prediction of sepsis in-hospital mortality as the use case. METHODS This was a retrospective study of adult ED visits admitted to the hospital meeting criteria for sepsis from October 2013 to October 2014. Sepsis was defined as meeting criteria for systemic inflammatory response syndrome with an infectious admitting diagnosis in the ED. ED visits were randomly partitioned into an 80%/20% split for training and validation. A random forest model (machine learning approach) was constructed using over 500 clinical variables from data available within the EHRs of four hospitals to predict in-hospital mortality. The machine learning prediction model was then compared to a classification and regression tree (CART) model, logistic regression model, and previously developed prediction tools on the validation data set using area under the receiver operating characteristic curve (AUC) and chi-square statistics. RESULTS There were 5,278 visits among 4,676 unique patients who met criteria for sepsis. Of the 4,222 patients in the training group, 210 (5.0%) died during hospitalization, and of the 1,056 patients in the validation group, 50 (4.7%) died during hospitalization. The AUCs with 95% confidence intervals (CIs) for the different models were as follows: random forest model, 0.86 (95% CI = 0.82 to 0.90); CART model, 0.69 (95% CI = 0.62 to 0.77); logistic regression model, 0.76 (95% CI = 0.69 to 0.82); CURB-65, 0.73 (95% CI = 0.67 to 0.80); MEDS, 0.71 (95% CI = 0.63 to 0.77); and mREMS, 0.72 (95% CI = 0.65 to 0.79). The random forest model AUC was statistically different from all other models (p ≤ 0.003 for all comparisons). CONCLUSIONS In this proof-of-concept study, a local big data-driven, machine learning approach outperformed existing CDRs as well as traditional analytic techniques for predicting in-hospital mortality of ED patients with sepsis. Future research should prospectively evaluate the effectiveness of this approach and whether it translates into improved clinical outcomes for high-risk sepsis patients. The methods developed serve as an example of a new model for predictive analytics in emergency care that can be automated, applied to other clinical outcomes of interest, and deployed in EHRs to enable locally relevant clinical predictions.

[1]  T. Murdoch,et al.  The inevitable application of big data to health care. , 2013, JAMA.

[2]  Roger J Lewis,et al.  Use of out-of-hospital variables to predict severity of injury in pediatric patients involved in motor vehicle crashes. , 2002, Annals of emergency medicine.

[3]  D. Bates,et al.  Big data in health care: using analytics to identify and manage high-risk and high-cost patients. , 2014, Health affairs.

[4]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[5]  L. Lind,et al.  Rapid Emergency Medicine Score can predict long-term mortality in nonsurgical emergency department patients. , 2004, Academic emergency medicine : official journal of the Society for Academic Emergency Medicine.

[6]  Achim Zeileis,et al.  Bias in random forest variable importance measures: Illustrations, sources and a solution , 2007, BMC Bioinformatics.

[7]  Marco D. Huesch,et al.  Implementing electronic health care predictive analytics: considerations and challenges. , 2014, Health affairs.

[8]  Avinash Chennamsetty,et al.  Treatment of Urethral Strictures from Irradiation and Other Nonsurgical Forms of Pelvic Cancer Treatment , 2015, Advances in urology.

[9]  K. Chou,et al.  Predicting Anatomical Therapeutic Chemical (ATC) Classification of Drugs by Integrating Chemical-Chemical Interactions and Similarities , 2012, PloS one.

[10]  G. Collins,et al.  Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement , 2015, BMJ : British Medical Journal.

[11]  Michael J. Pencina,et al.  The Role of Physicians in the Era of Predictive Analytics. , 2015, JAMA.

[12]  E. DeLong,et al.  Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. , 1988, Biometrics.

[13]  J. Kline,et al.  Emergency medicine practitioner knowledge and use of decision rules for the evaluation of patients with suspected pulmonary embolism: variations by practice setting and training level. , 2007, Academic emergency medicine : official journal of the Society for Academic Emergency Medicine.

[14]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[15]  I. Stiell,et al.  Methodologic standards for the development of clinical decision rules in emergency medicine. , 1999, Annals of emergency medicine.

[16]  R. Lewis An Introduction to Classification and Regression Tree (CART) Analysis , 2000 .

[17]  E. Ivers,et al.  Early Goal-Directed Therapy in the Treatment of Severe Sepsis and Septic Shock , 2001 .

[18]  Michael Bailey,et al.  Systemic inflammatory response syndrome criteria in defining severe sepsis. , 2015, The New England journal of medicine.

[19]  J. Ioannidis,et al.  External validation of new risk prediction models is infrequent and reveals worse prognostic discrimination. , 2015, Journal of clinical epidemiology.

[20]  W. Lim,et al.  Defining community acquired pneumonia severity on presentation to hospital: an international derivation and validation study , 2003, Thorax.

[21]  Peter J. Haug,et al.  Exploiting missing clinical data in Bayesian network modeling for predicting medical problems , 2008, J. Biomed. Informatics.

[22]  Jimeng Sun,et al.  PARAMO: A PARAllel predictive MOdeling platform for healthcare analytic research using electronic health records , 2014, J. Biomed. Informatics.

[23]  David W Bates,et al.  Mortality in Emergency Department Sepsis (MEDS) score: A prospectively derived and validated clinical prediction rule* , 2003, Critical care medicine.

[24]  Pat Croskerry,et al.  From mindless to mindful practice--cognitive bias and clinical decision making. , 2013, The New England journal of medicine.

[25]  Donald M Yealy,et al.  Methodologic standards for interpreting clinical decision rules in emergency medicine: 2014 update. , 2014, Annals of emergency medicine.

[26]  Jean-Michel Poggi,et al.  Variable selection using random forests , 2010, Pattern Recognit. Lett..

[27]  J. Concato,et al.  A simulation study of the number of events per variable in logistic regression analysis. , 1996, Journal of clinical epidemiology.

[28]  K. Zou,et al.  Receiver-Operating Characteristic Analysis for Evaluating Diagnostic Tests and Predictive Models , 2007, Circulation.

[29]  S. Johnston Identifying confounding by indication through blinded prospective review. , 2001, American journal of epidemiology.

[30]  H Robbins,et al.  Complete Convergence and the Law of Large Numbers. , 1947, Proceedings of the National Academy of Sciences of the United States of America.

[31]  P. J. Howanitz,et al.  Laboratory critical values policies and procedures: a college of American Pathologists Q-Probes Study in 623 institutions. , 2002, Archives of pathology & laboratory medicine.

[32]  D.M. Mount,et al.  An Efficient k-Means Clustering Algorithm: Analysis and Implementation , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[33]  James D Katz,et al.  Random Forests Classification Analysis for the Assessment of Diagnostic Skill , 2010, American journal of medical quality : the official journal of the American College of Medical Quality.

[34]  J. Vincent,et al.  The SOFA (Sepsis-related Organ Failure Assessment) score to describe organ dysfunction/failure , 1996, Intensive Care Medicine.

[35]  Michael J Rothman,et al.  Measuring the modified early warning score and the Rothman Index: Advantages of utilizing the electronic medical record in an early warning system , 2013, Journal of hospital medicine.

[36]  H. Krumholz Big data and new knowledge in medicine: the thinking, training, and tools needed for a learning health system. , 2014, Health affairs.

[37]  G. Clermont,et al.  Epidemiology of severe sepsis in the United States: Analysis of incidence, outcome, and associated costs of care , 2001, Critical care medicine.

[38]  E. Kulstad,et al.  Comparison of severity of illness scoring systems in the prediction of hospital mortality in severe sepsis and septic shock , 2010, Journal of emergencies, trauma, and shock.

[39]  W. Knaus,et al.  APACHE II: a severity of disease classification system. , 1985 .

[40]  Ruben Amarasingham,et al.  The legal and ethical concerns that arise from using complex predictive analytics in health care. , 2014, Health affairs.

[41]  Shyam Visweswaran,et al.  Improving Classification Performance with Discretization on Biomedical Datasets , 2008, AMIA.

[42]  R. Derlet,et al.  A decision rule for identifying children at low risk for brain injuries after blunt head trauma. , 2003, Annals of emergency medicine.

[43]  C. Stehouwer,et al.  Predictive accuracy and feasibility of risk stratification scores for 28-day mortality of patients with sepsis in an emergency department , 2014, European journal of emergency medicine : official journal of the European Society for Emergency Medicine.

[44]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[45]  M. Howell,et al.  Performance of severity of illness scoring systems in emergency department patients with infection. , 2007, Academic emergency medicine : official journal of the Society for Academic Emergency Medicine.

[46]  Paulo J. G. Lisboa,et al.  Making machine learning models interpretable , 2012, ESANN.

[47]  Huan Liu,et al.  Discretization: An Enabling Technique , 2002, Data Mining and Knowledge Discovery.

[48]  Carol Bennett,et al.  Implementation of clinical decision rules in the emergency department. , 2007, Academic emergency medicine : official journal of the Society for Academic Emergency Medicine.

[49]  P. Levy,et al.  Exploring the Potential of Predictive Analytics and Big Data in Emergency Care. , 2016, Annals of emergency medicine.

[50]  M. Woodward,et al.  Risk prediction models: II. External validation, model updating, and impact assessment , 2012, Heart.

[51]  S. Adams,et al.  Clinical prediction rules , 2012, BMJ : British Medical Journal.

[52]  Suresh Chalasani,et al.  Predictive analytics on Electronic Health Records (EHRs) using Hadoop and Hive , 2015, 2015 IEEE International Conference on Electrical, Computer and Communication Technologies (ICECCT).

[53]  Y. Vergouwe,et al.  Validation, updating and impact of clinical prediction rules: a review. , 2008, Journal of clinical epidemiology.