Deep dynamic imputation of clinical time series for mortality prediction

Abstract Missing values in clinical time-series data are pervasive and inevitable; they not only increase the complexity and difficulty of analyzing the data but also lead to biased results. To tackle these two problems, researchers have been exploring recurrent neural network (RNN)-based methods for detecting how well missing values are addressed with the aim of achieving state-of-the-art performance. However, these methods have two practical drawbacks. 1) Handling time-series data with multiple, irregular, abnormal values is difficult. 2) The patterns that may be present in the missing clinical data are not thoroughly considered. Moreover, to the best of our knowledge, none of these methods have been explicitly designed to dynamically optimize the imputation quality for better performance in the realm of clinical time-series analytics. By considering the quality of imputed values , we propose a 2-step integrated imputation-prediction model based on gated recurrent units (GRUs) for medical prediction tasks. In the first step, the missing values are imputed using a sophisticated model based on a replenished GRU with a hidden state decay mechanism (RGRU-D), which is followed by evaluation through two additional layers. In the second step, the optimized imputed values are used to predict the risk of mortality in critical patients. Our model effectively supplies missing values for the masking, time interval , bursty, and cumulative missing rate variables within an integrated deep architecture. Extensive experiments on a real-world ICU dataset demonstrate that our model performs better than the compared methods in terms of the imputation quality and prediction accuracy.

[1]  Jieping Ye,et al.  Identifying Genetic Risk Factors for Alzheimer's Disease via Shared Tree-Guided Feature Learning Across Multiple Tasks , 2018, IEEE Transactions on Knowledge and Data Engineering.

[2]  Jinsung Yoon,et al.  Estimating Missing Data in Temporal Data Streams Using Multi-Directional Recurrent Neural Networks , 2017, IEEE Transactions on Biomedical Engineering.

[3]  Elie Azoulay,et al.  Reporting and handling missing values in clinical studies in intensive care units , 2013, Intensive Care Medicine.

[4]  George Hripcsak,et al.  Temporal reasoning with medical data - A review with emphasis on medical natural language processing , 2007, J. Biomed. Informatics.

[5]  Peter Szolovits,et al.  MIMIC-III, a freely accessible critical care database , 2016, Scientific Data.

[6]  David C. Kale,et al.  Directly Modeling Missing Data in Sequences with RNNs: Improved Classification of Clinical Time Series , 2016, MLHC.

[7]  Ying Li,et al.  Complication Risk Profiling in Diabetes Care: A Bayesian Multi-Task and Feature Relationship Learning Approach , 2020, IEEE Transactions on Knowledge and Data Engineering.

[8]  Weitong Chen,et al.  User Relation Prediction Based on Matrix Factorization and Hybrid Particle Swarm Optimization , 2017, WWW.

[9]  Constantine Frangakis,et al.  Multiple imputation by chained equations: what is it and how does it work? , 2011, International journal of methods in psychiatric research.

[10]  Yves Lecarpentier,et al.  New formula for predicting mean pulmonary artery pressure using systolic pulmonary artery pressure. , 2004, Chest.

[11]  Zhiwen Zeng,et al.  An Adaptive Collection Scheme-Based Matrix Completion for Data Gathering in Energy-Harvesting Wireless Sensor Networks , 2019, IEEE Access.

[12]  Xiaoqian Jiang,et al.  A Mortality Study for ICU Patients Using Bursty Medical Events , 2017, 2017 IEEE 33rd International Conference on Data Engineering (ICDE).

[13]  Kwang-Il Goh,et al.  Burstiness and memory in complex systems , 2006 .

[14]  Ge Yu,et al.  REMIAN: Real-Time and Error-Tolerant Missing Value Imputation , 2020, ACM Trans. Knowl. Discov. Data.

[15]  D. Percival,et al.  Wavelet variance analysis for gappy time series , 2010 .

[16]  Introduction to Probability Theory and Sampling Distributions , 2003 .

[17]  Dinggang Shen,et al.  Multi-View Missing Data Completion , 2018, IEEE Transactions on Knowledge and Data Engineering.

[18]  Jimeng Sun,et al.  Rubik: Knowledge Guided Tensor Factorization and Completion for Health Data Analytics , 2015, KDD.

[19]  Susan Armijo-Olivo,et al.  Intention to treat analysis, compliance, drop-outs and how to deal with missing data in clinical research: a review , 2009 .

[20]  John P A Ioannidis,et al.  Predicting death: an empirical evaluation of predictive tools for mortality. , 2011, Archives of internal medicine.

[21]  Xue Li,et al.  IDDSAM: An Integrated Disease Diagnosis and Severity Assessment Model for Intensive Care Units , 2020, IEEE Access.

[22]  Zhongheng Zhang,et al.  Multiple imputation with multivariate imputation by chained equation (MICE) package. , 2016, Annals of translational medicine.

[23]  Chih-Fong Tsai,et al.  Missing value imputation: a review and analysis of the literature (2006–2017) , 2019, Artificial Intelligence Review.

[24]  Wei Zhang,et al.  Knowledge-Aware Deep Dual Networks for Text-Based Mortality Prediction , 2019, 2019 IEEE 35th International Conference on Data Engineering (ICDE).

[25]  David M Kreindler,et al.  The effects of the irregular sample and missing data in time series analysis. , 2006, Nonlinear dynamics, psychology, and life sciences.

[26]  Stef van Buuren,et al.  Flexible Imputation of Missing Data , 2012 .

[27]  Meng Liu,et al.  Online Data Organizer: Micro-Video Categorization by Structure-Guided Multimodal Dictionary Learning , 2019, IEEE Transactions on Image Processing.

[28]  Peter Bauer,et al.  SAPS 3—From evaluation of the patient to evaluation of the intensive care unit. Part 2: Development of a prognostic model for hospital mortality at ICU admission , 2005, Intensive Care Medicine.

[29]  Hyun Kang The prevention and handling of the missing data , 2013, Korean journal of anesthesiology.

[30]  Andreas Spanias,et al.  Attend and Diagnose: Clinical Time Series Analysis using Attention Models , 2017, AAAI.

[31]  Lina Yao,et al.  Learning Multiple Diagnosis Codes for ICU Patients with Local Disease Correlation Mining , 2017, ACM Trans. Knowl. Discov. Data.

[32]  Yuxin Chen,et al.  Nonconvex Optimization Meets Low-Rank Matrix Factorization: An Overview , 2018, IEEE Transactions on Signal Processing.

[33]  Wei Cao,et al.  BRITS: Bidirectional Recurrent Imputation for Time Series , 2018, NeurIPS.

[34]  K. Ferguson,et al.  Comparison of Pulmonary Artery, Rectal, and Tympanic Membrane Temperatures in Adult Intensive Care Unit Patients , 1991, Clinical pediatrics.

[35]  Paul C. Boutros,et al.  Optimization and expansion of non-negative matrix factorization , 2020, BMC Bioinformatics.

[36]  Jeffrey Dean,et al.  Scalable and accurate deep learning with electronic health records , 2018, npj Digital Medicine.

[37]  Ioannis P. Androulakis,et al.  Bioengineering and Biotechnology Perspective Article a Systems Engineering Perspective on Homeostasis and Disease , 2022 .

[38]  Shichao Zhang,et al.  The Journal of Systems and Software , 2012 .

[39]  Zili Zhang,et al.  Missing Value Estimation for Mixed-Attribute Data Sets , 2011, IEEE Transactions on Knowledge and Data Engineering.

[40]  Weitong Chen,et al.  DMMAM: Deep Multi-source Multi-task Attention Model for Intensive Care Unit Diagnosis , 2019, DASFAA.

[41]  Manfred S. Green,et al.  Estimation and development of 10- and 20-year cardiovascular mortality risk models in an industrial male workers database. , 2017, Preventive medicine.

[42]  Sudhir Kumar,et al.  An accurate missing data prediction method using LSTM based deep learning for health care , 2019, ICDCN.

[43]  Anis Sharafoddini,et al.  A New Insight Into Missing Data in Intensive Care Unit Patient Profiles: Observational Study , 2018, JMIR medical informatics.

[44]  Yan Liu,et al.  Recurrent Neural Networks for Multivariate Time Series with Missing Values , 2016, Scientific Reports.

[45]  J. Vincent,et al.  The SOFA (Sepsis-related Organ Failure Assessment) score to describe organ dysfunction/failure , 1996, Intensive Care Medicine.

[46]  Ahmet Arslan,et al.  A hybrid method for imputation of missing values using optimized fuzzy c-means with support vector regression and a genetic algorithm , 2013, Inf. Sci..

[47]  W. Knaus,et al.  The APACHE III prognostic system. Risk prediction of hospital mortality for critically ill hospitalized adults. , 1991, Chest.