Interpretable Multi-Task Deep Neural Networks for Dynamic Predictions of Postoperative Complications

Accurate prediction of postoperative complications can inform shared decisions between patients and surgeons regarding the appropriateness of surgery, preoperative risk-reduction strategies, and postoperative resource use. Traditional predictive analytic tools are hindered by suboptimal performance and usability. We hypothesized that novel deep learning techniques would outperform logistic regression models in predicting postoperative complications. In a single-center longitudinal cohort of 43,943 adult patients undergoing 52,529 major inpatient surgeries, deep learning yielded greater discrimination than logistic regression for all nine complications. Predictive performance was strongest when leveraging the full spectrum of preoperative and intraoperative physiologic time-series electronic health record data. A single multi-task deep learning model yielded greater performance than separate models trained on individual complications. Integrated gradients interpretability mechanisms demonstrated the substantial importance of missing data. Interpretable, multi-task deep neural networks made accurate, patient-level predictions that harbor the potential to augment surgical decision-making.

[1]  W. Youden,et al.  Index for rating diagnostic tests , 1950, Cancer.

[2]  P Szolovits,et al.  Artificial intelligence in medicine. Where do we stand? , 1987, The New England journal of medicine.

[3]  C. Mackenzie,et al.  A new method of classifying prognostic comorbidity in longitudinal studies: development and validation. , 1987, Journal of chronic diseases.

[4]  R. Dybowski,et al.  Prediction of outcome in critically ill patients using artificial neural network synthesised by genetic algorithm , 1996, The Lancet.

[5]  T. Osler,et al.  Complications in surgical patients. , 2002, Archives of surgery.

[6]  W. Henderson,et al.  Hospital costs associated with surgical complications: a report from the private-sector National Surgical Quality Improvement Program. , 2004, Journal of the American College of Surgeons.

[7]  A. Elixhauser,et al.  Profile of inpatient operating room procedures in US hospitals in 2007. , 2010, Archives of surgery.

[8]  Woojae Kim,et al.  A Comparison of Intensive Care Unit Mortality Prediction Models through the Use of Data Mining Techniques , 2011, Healthcare informatics research.

[9]  Ewout W Steyerberg,et al.  Extensions of net reclassification improvement calculations to measure usefulness of new biomarkers , 2011, Statistics in medicine.

[10]  C. Ko,et al.  Development and evaluation of the universal ACS NSQIP surgical risk calculator: a decision aid and informed consent tool for patients and surgeons. , 2013, Journal of the American College of Surgeons.

[11]  J. Henry,et al.  Adoption of Electronic Health Record Systems among U . S . Non-Federal Acute Care Hospitals : 2008-2015 , 2013 .

[12]  G. Collins,et al.  Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD Statement , 2015, BMC Medicine.

[13]  Takaya Saito,et al.  The Precision-Recall Plot Is More Informative than the ROC Plot When Evaluating Binary Classifiers on Imbalanced Datasets , 2015, PloS one.

[14]  David C. Kale,et al.  Modeling Missing Data in Clinical Time Series with RNNs , 2016 .

[15]  Yarin Gal,et al.  Uncertainty in Deep Learning , 2016 .

[16]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[17]  Charles Elkan,et al.  Learning to Diagnose with LSTM Recurrent Neural Networks , 2015, ICLR.

[18]  Peter Szolovits,et al.  MIMIC-III, a freely accessible critical care database , 2016, Scientific Data.

[19]  C. Ko,et al.  An Examination of American College of Surgeons NSQIP Surgical Risk Calculator Accuracy. , 2017, Journal of the American College of Surgeons.

[20]  Sebastian Thrun,et al.  Dermatologist-level classification of skin cancer with deep neural networks , 2017, Nature.

[21]  Ankur Taly,et al.  Axiomatic Attribution for Deep Networks , 2017, ICML.

[22]  Scott M. Lundberg,et al.  Explainable machine-learning predictions for the prevention of hypoxaemia during surgery , 2018, Nature Biomedical Engineering.

[23]  Fei Wang,et al.  Deep learning for healthcare: review, opportunities and challenges , 2018, Briefings Bioinform..

[24]  Parisa Rashidi,et al.  Deep EHR: A Survey of Recent Advances in Deep Learning Techniques for Electronic Health Record (EHR) Analysis , 2017, IEEE Journal of Biomedical and Health Informatics.

[25]  Geraint Rees,et al.  Clinically applicable deep learning for diagnosis and referral in retinal disease , 2018, Nature Medicine.

[26]  M. Goldblatt,et al.  Eye of the beholder: Risk calculators and barriers to adoption in surgical trainees , 2018, Surgery.

[27]  D. Bertsimas,et al.  Surgical Risk Is Not Linear: Derivation and Validation of a Novel, User-friendly, and Machine-learning-based Predictive OpTimal Trees in Emergency Surgery Risk (POTTER) Calculator , 2018, Annals of surgery.

[28]  H. Abdullah,et al.  Utilizing Machine Learning Methods for Preoperative Prediction of Postsurgical Mortality and Intensive Care Unit Admission , 2019, Annals of surgery.

[29]  Andre Esteva,et al.  A guide to deep learning in healthcare , 2019, Nature Medicine.

[30]  Parisa Rashidi,et al.  DeepSOFA: A Continuous Acuity Score for Critically Ill Patients using Clinically Interpretable Deep Learning , 2018, Scientific Reports.

[31]  G. Corrado,et al.  Using a Deep Learning Algorithm and Integrated Gradients Explanation to Assist Grading for Diabetic Retinopathy. , 2019, Ophthalmology.

[32]  Suman V. Ravuri,et al.  A Clinically Applicable Approach to Continuous Prediction of Future Acute Kidney Injury , 2019, Nature.

[33]  Kirk Roberts,et al.  Deep Patient Representation of Clinical Notes via Multi-Task Learning for Mortality Prediction. , 2019, AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science.

[34]  Aram Galstyan,et al.  Multitask learning and benchmarking with clinical time series data , 2017, Scientific Data.

[35]  Gloria P. Lipori,et al.  MySurgeryRisk: Development and Validation of a Machine-learning Risk Algorithm for Major Complications and Death After Surgery , 2019, Annals of surgery.

[36]  Nigam H. Shah,et al.  The Effectiveness of Multitask Learning for Phenotyping with Electronic Health Records Data , 2019, PSB.

[37]  Jesse M. Ehrenfeld,et al.  Use of the American College of Surgeons National Surgical Quality Improvement Program Surgical Risk Calculator During Preoperative Risk Discussion: The Patient Perspective , 2019, Anesthesia and analgesia.

[38]  J. Wedzicha,et al.  Development and Reporting of Prediction Models: Guidance for Authors From Editors of Respiratory, Sleep, and Critical Care Journals , 2020, Critical care medicine.

[39]  J. Ioannidis,et al.  Validation and Utility Testing of Clinical Prediction Models: Time to Change the Approach. , 2020, JAMA.