Length-of-stay and mortality prediction for a major hospital through interpretable machine learning

Importance: Understanding the discharge process at a hospital level is key in improving effi-ciency and quality of care. Objective: Investigate how machine learning can help anticipate various aspects of patient discharges, from predicting length-of-stay to discharge destination or hospital mortality. Design: Retrospective study performed on inpatients admitted at Beth Israel Deaconess Medical Center between January 2017 and August 2018. Setting: Single-center study in a large academic medical center in the Boston area. Participants: We included inpatients admitted at BIDMC between January 2017 and August 2018, excluding patients admitted into psychiatry, obstetrics and newborns. The final cohort consisted of 63 , 432 unique admissions (41 , 726 unique patients). Main outcomes and measures: We predicted whether a patient will be discharged in the next 24 or 48 hours, whether she will stay more than 7 or 14 days and predict discharge destination among home, home with services, extended care facility and hospital mortality. Data is collected retrospectively from electronic health records. Methods and results: We used data from 63 , 432 admissions at BIDMC (50.0% female, median age 64 years old, median length-of-stay 3.12 days) to answer four length-of-stay-related questions, as well as to predict discharge destination. We applied five different machine learning algorithms. With the best performing method, we predict same-day discharges (remaining length-of-stay < 1 day) with an area under the receiving operator curve (AUC) of 0.843 (95%CI 0.839-0.847), next-day discharges (remaining length-of-stay < 2 days) with an AUC of 0.819-0.826, long-stay patients (overall length-of-stay > 7/14 days) with an AUC of 0.816-0.825 and 0.820-0.833 respectively. Similarly, we accurately predict discharge destination (weighted AUC of 0.835-0.839), hospital mortality (AUC 0.959-0.964) and discharge to extended care facility (AUC 0.852-0.858). Conclusions: We are able to accurately identify same-day or next-day discharges, long-stay patients and predict discharge destination. Though less accurate, simpler and interpretable models, such as decision trees, demonstrate very good predictive power, provide insights on discharge barriers and have been instrumental in interacting with care providers. In addition, those models are, compared to deep learning approaches, frugal in data and computational power and provide production-level analytics for EHRs.

[1]  Thomas H. McCoy,et al.  Assessment of Time-Series Machine Learning Methods for Forecasting Hospital Discharge Volume , 2018, JAMA network open.

[2]  S. Tamang,et al.  Potential Biases in Machine Learning Algorithms Using Electronic Health Record Data , 2018, JAMA internal medicine.

[3]  Jeffrey Dean,et al.  Scalable and accurate deep learning with electronic health records , 2018, npj Digital Medicine.

[4]  A. Forster,et al.  The TEND (Tomorrow’s Expected Number of Discharges) Model Accurately Predicted the Number of Patients Who Were Discharged from the Hospital the Next Day , 2017, Journal of hospital medicine.

[5]  Dimitris Bertsimas,et al.  Optimal classification trees , 2017, Machine Learning.

[6]  Mohamed Bader-El-Den,et al.  Patient length of stay and mortality prediction: A survey , 2017, Health services management research.

[7]  Dimitris Bertsimas,et al.  From Predictive Methods to Missing Data Imputation: An Optimization Approach , 2017, J. Mach. Learn. Res..

[8]  Sean L. Barnes,et al.  Real-time prediction of inpatient length of stay for discharge prioritization , 2016, J. Am. Medical Informatics Assoc..

[9]  Xiaowu Sun,et al.  Using electronic health record data to develop inpatient mortality predictive model: Acute Laboratory Risk of Mortality Score (ALaRMS) , 2013, J. Am. Medical Informatics Assoc..

[10]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[11]  G. Escobar,et al.  Length of Stay Predictions: Improvements Through the Use of Automated Laboratory and Comorbidity Variables , 2010, Medical care.

[12]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .