Predicting Complications in Critical Care Using Heterogeneous Clinical Data

Patients in hospitals, particularly in critical care, are susceptible to many complications affecting morbidity and mortality. Digitized clinical data in electronic medical records can be effectively used to develop machine learning models to identify patients at risk of complications early and provide prioritized care to prevent complications. However, clinical data from heterogeneous sources within hospitals pose significant modeling challenges. In particular, unstructured clinical notes are a valuable source of information containing regular assessments of the patient's condition but contain inconsistent abbreviations and lack the structure of formal documents. Our contributions in this paper are twofold. First, we present a new preprocessing technique for extracting features from informal clinical notes that can be used in a classification model to identify patients at risk of developing complications. Second, we explore the use of collective matrix factorization, a multi-view learning technique, to model heterogeneous clinical data-text-based features in combination with other measurements, such as clinical investigations, comorbidites, and demographic data. We present a detailed case study on postoperative respiratory failure using more than 700 patient records from the MIMIC II database. Our experiments demonstrate the efficacy of our preprocessing technique in extracting discriminatory features from clinical notes as well as the benefits of multi-view learning to combine clinical measurements with text data for predicting complications.

[1]  Fei Wang,et al.  ICDA: A Platform for Intelligent Care Delivery Analytics , 2012, AMIA.

[2]  L. Napolitano,et al.  Common complications in the critically ill patient. , 2012, The Surgical clinics of North America.

[3]  V. Lawrence,et al.  Preoperative Pulmonary Risk Stratification for Noncardiothoracic Surgery: Systematic Review for the American College of Physicians , 2006, Annals of Internal Medicine.

[4]  Milos Hauskrecht,et al.  A Pattern Mining Approach for Classifying Multivariate Temporal Data , 2011, 2011 IEEE International Conference on Bioinformatics and Biomedicine.

[5]  Trevor Darrell,et al.  Factorized Multi-Modal Topic Model , 2012, UAI.

[6]  L. Napolitano,et al.  Postoperative pulmonary complications: pneumonia and acute respiratory failure. , 2012, The Surgical clinics of North America.

[7]  Guillaume Bouchard,et al.  Group-sparse Embeddings in Collective Matrix Factorization , 2013, ICLR.

[8]  John F. Hurdle,et al.  Automated identification of adverse events related to central venous catheters , 2007, J. Biomed. Informatics.

[9]  L. Neumayer,et al.  Multivariable predictors of postoperative respiratory failure after general and vascular surgery: results from the patient safety in surgery study. , 2007, Journal of the American College of Surgeons.

[10]  J. Canet,et al.  Postoperative respiratory failure: pathogenesis, prediction, and prevention , 2014, Current opinion in critical care.

[11]  R G Mark,et al.  MIMIC II: a massive temporal ICU patient database to support research in intelligent patient monitoring , 2002, Computers in Cardiology.

[12]  Jenna Wiens,et al.  Patient Risk Stratification for Hospital-Associated C. diff as a Time-Series Classification Task , 2012, NIPS.

[13]  Atul J Butte,et al.  Identification of complex metabolic states in critically injured patients using bioinformatic cluster analysis , 2010, Critical care.

[14]  A. Gatherer Clinical , 1997 .

[15]  Shourya Roy,et al.  Predicting Postoperative Acute Respiratory Failure in critical care using nursing notes and physiological signals , 2014, 2014 36th Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[16]  Anna Rumshisky,et al.  Unfolding physiological state: mortality modelling in intensive care units , 2014, KDD.

[17]  Le Song,et al.  Constructing Disease Network and Temporal Progression Model via Context-Sensitive Hawkes Process , 2015, 2015 IEEE International Conference on Data Mining.

[18]  Marzyeh Ghassemi,et al.  Metadata Correction: Making Big Data Useful for Health Care: A Summary of the Inaugural MIT Critical Data Conference , 2015, JMIR medical informatics.

[19]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[20]  R. Depalma,et al.  Determinants of Long-Term Survival After Major Surgery and the Adverse Effect of Postoperative Complications , 2005, Annals of surgery.

[21]  Xiang Wang,et al.  Unsupervised learning of disease progression models , 2014, KDD.

[22]  Benjamin M. Marlin,et al.  Unsupervised pattern discovery in electronic health care data using probabilistic clustering models , 2012, IHI '12.

[23]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[24]  Xiaohua Hu,et al.  A matching framework for modeling symptom and medication relationships from clinical notes , 2014, 2014 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[25]  David A. Clifton,et al.  Multitask Gaussian Processes for Multivariate Physiological Time-Series Analysis , 2015, IEEE Transactions on Biomedical Engineering.

[26]  Shamim Nemati,et al.  Machine Learning and Decision Support in Critical Care , 2016, Proceedings of the IEEE.

[27]  Carmen Gomar,et al.  Prediction of Postoperative Pulmonary Complications in a Population-based Surgical Cohort , 2010, Anesthesiology.

[28]  O. Nafiu,et al.  Independent Predictors and Outcomes of Unanticipated Early Postoperative Tracheal Intubation after Nonemergent, Noncardiac Surgery , 2011, Anesthesiology.

[29]  D. Needham,et al.  Long-term complications of critical care , 2011, Critical care medicine.

[30]  Ognjen Gajic,et al.  Derivation and Diagnostic Accuracy of the Surgical Lung Injury Prediction Model , 2011, Anesthesiology.

[31]  C M Wollschlager,et al.  Common complications in critically ill patients. , 1988, Disease-a-month : DM.

[32]  Jiayu Zhou,et al.  Modeling disease progression via fused sparse group lasso , 2012, KDD.

[33]  Jimeng Sun,et al.  Marble: high-throughput phenotyping from electronic health records via sparse nonnegative tensor factorization , 2014, KDD.

[34]  Nigam H. Shah,et al.  Building the graph of medicine from millions of clinical narratives , 2014, Scientific Data.

[35]  Ryan P. Adams,et al.  Graph-Sparse LDA: A Topic Model with Structured Sparsity , 2014, AAAI.

[36]  Suchi Saria,et al.  Clustering Longitudinal Clinical Marker Trajectories from Electronic Health Data: Applications to Phenotyping and Endotype Discovery , 2015, AAAI.

[37]  Fei Wang,et al.  From micro to macro: data driven phenotyping by densification of longitudinal electronic medical records , 2014, KDD.

[38]  Adler J. Perotte,et al.  Learning probabilistic phenotypes from heterogeneous EHR data , 2015, J. Biomed. Informatics.

[39]  Yan Liu,et al.  Deep Computational Phenotyping , 2015, KDD.

[40]  D. Hess,et al.  Development and Validation of a Score for Prediction of Postoperative Respiratory Complications , 2013, Anesthesiology.

[41]  W. DuMouchel,et al.  Unlocking Clinical Data from Narrative Reports: A Study of Natural Language Processing , 1995, Annals of Internal Medicine.

[42]  Yen S. Low,et al.  Text Mining for Adverse Drug Events: the Promise, Challenges, and State of the Art , 2014, Drug Safety.

[43]  W. Henderson,et al.  Hospital costs associated with surgical complications: a report from the private-sector National Surgical Quality Improvement Program. , 2004, Journal of the American College of Surgeons.

[44]  Xiang Fang,et al.  Development and validation of a risk calculator predicting postoperative respiratory failure. , 2011, Chest.

[45]  Mohammed Saeed,et al.  Risk Stratification of ICU Patients Using Topic Models Inferred from Unstructured Progress Notes , 2012, AMIA.

[46]  John Shawe-Taylor,et al.  Canonical Correlation Analysis: An Overview with Application to Learning Methods , 2004, Neural Computation.

[47]  Jeffrey M. Hausdorff,et al.  Physionet: Components of a New Research Resource for Complex Physiologic Signals". Circu-lation Vol , 2000 .

[48]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[49]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[50]  Hui Xiong,et al.  Temporal Phenotyping from Longitudinal Electronic Health Records: A Graph Based Framework , 2015, KDD.

[51]  J. Blum,et al.  Preoperative and Intraoperative Predictors of Postoperative Acute Respiratory Distress Syndrome in a General Surgical Population , 2013, Anesthesiology.

[52]  Yixin Chen,et al.  An integrated data mining approach to real-time clinical monitoring and deterioration warning , 2012, KDD.

[53]  W. Henderson,et al.  Multifactorial Risk Index for Predicting Postoperative Respiratory Failure in Men After Major Noncardiac Surgery , 2000, Annals of surgery.

[54]  Jimeng Sun,et al.  SympGraph: a framework for mining clinical notes through symptom relation graphs , 2012, KDD.

[55]  Ognjen Gajic,et al.  Early identification of patients at risk of acute lung injury: evaluation of lung injury prediction score in a multicenter cohort study. , 2011, American journal of respiratory and critical care medicine.

[56]  Guohua Li,et al.  A Scoring System to Predict Unplanned Intubation in Patients Having Undergone Major Surgical Procedures , 2012, Anesthesia and analgesia.

[57]  Steven H. Brown,et al.  Automated identification of postoperative complications within an electronic medical record using natural language processing. , 2011, JAMA.

[58]  Peter Szolovits,et al.  A Multivariate Timeseries Modeling Approach to Severity of Illness Assessment and Forecasting in ICU with Sparse, Heterogeneous Clinical Data , 2015, AAAI.

[59]  Samuel Kaski,et al.  Bayesian Canonical correlation analysis , 2013, J. Mach. Learn. Res..

[60]  Thomas A. Lasko,et al.  Efficient Inference of Gaussian-Process-Modulated Renewal Processes with Application to Medical Event Data , 2014, UAI.

[61]  Jimeng Sun,et al.  MatrixFlow: Temporal Network Visual Analytics to Track Symptom Evolution during Disease Progression , 2012, AMIA.

[62]  Ram Akella,et al.  Dynamically Modeling Patient's Health State from Electronic Medical Records: A Time Series Approach , 2015, KDD.

[63]  Michel Verleysen,et al.  Robust probabilistic projections , 2006, ICML.

[64]  N. Halpern,et al.  Critical Care Medicine in the United States: Addressing the Intensivist Shortage and Image of the Specialty* , 2013, Critical care medicine.

[65]  Eric Horvitz,et al.  Predicting Depression via Social Media , 2013, ICWSM.