Uncertainty-Aware Variational-Recurrent Imputation Network for Clinical Time Series

Electronic health records (EHR) consist of longitudinal clinical observations portrayed with sparsity, irregularity, and high dimensionality, which become major obstacles in drawing reliable downstream clinical outcomes. Although there exist great numbers of imputation methods to tackle these issues, most of them ignore correlated features, temporal dynamics, and entirely set aside the uncertainty. Since the missing value estimates involve the risk of being inaccurate, it is appropriate for the method to handle the less certain information differently than the reliable data. In that regard, we can use the uncertainties in estimating the missing values as the fidelity score to be further utilized to alleviate the risk of biased missing value estimates. In this work, we propose a novel variational-recurrent imputation network, which unifies an imputation and a prediction network by taking into account the correlated features, temporal dynamics, as well as uncertainty. Specifically, we leverage the deep generative model in the imputation, which is based on the distribution among variables, and a recurrent imputation network to exploit the temporal relations, in conjunction with utilization of the uncertainty. We validated the effectiveness of our proposed model on two publicly available real-world EHR datasets: 1) PhysioNet Challenge 2012 and 2) MIMIC-III, and compared the results with other competing state-of-the-art methods in the literature.

[1]  Jun Zhao,et al.  Relevance Vector Machines-Based Time Series Prediction for Incomplete Training Dataset: Two Comparative Approaches. , 2019, IEEE transactions on cybernetics.

[2]  Zhiwen Yu,et al.  End-to-End Incomplete Time-Series Modeling From Linear Memory of Latent Variables , 2020, IEEE Transactions on Cybernetics.

[3]  Tie Qiu,et al.  Recurrent Broad Learning Systems for Time Series Prediction , 2020, IEEE Transactions on Cybernetics.

[4]  Gunnar Rätsch,et al.  GP-VAE: Deep Probabilistic Time Series Imputation , 2019, AISTATS.

[5]  Pablo M. Olmos,et al.  Handling Incomplete Heterogeneous Data using VAEs , 2018, Pattern Recognit..

[6]  Heung-Il Suk,et al.  Stochastic Imputation and Uncertainty-Aware Attention to EHR for Mortality Prediction , 2019, 2019 International Joint Conference on Neural Networks (IJCNN).

[7]  Klaus-Robert Müller,et al.  iNNvestigate neural networks! , 2018, J. Mach. Learn. Res..

[8]  Javier E. Contreras-Reyes,et al.  Bayesian modeling of individual growth variability using back-calculation: Application to pink cusk-eel (Genypterus blacodes) off Chile , 2018, Ecological Modelling.

[9]  Yan Liu,et al.  Benchmarking deep learning models on large healthcare datasets , 2018, J. Biomed. Informatics.

[10]  Wei Cao,et al.  BRITS: Bidirectional Recurrent Imputation for Time Series , 2018, NeurIPS.

[11]  Wojciech Samek,et al.  Methods for interpreting and understanding deep neural networks , 2017, Digit. Signal Process..

[12]  Yan Liu,et al.  Recurrent Neural Networks for Multivariate Time Series with Missing Values , 2016, Scientific Reports.

[13]  Ying Zhang,et al.  Multivariate Time Series Imputation with Generative Adversarial Networks , 2018, NeurIPS.

[14]  Alex Kendall,et al.  What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision? , 2017, NIPS.

[15]  W. Zame,et al.  Multi-directional Recurrent Neural Networks : A Novel Method for Estimating Missing Data , 2017 .

[16]  et al.,et al.  Missing Data Imputation in the Electronic Health Record Using Deeply Learned Autoencoders , 2017, PSB.

[17]  Ping Zhang,et al.  Risk Prediction with Electronic Health Records: A Deep Learning Approach , 2016, SDM.

[18]  David C. Kale,et al.  Directly Modeling Missing Data in Sequences with RNNs: Improved Classification of Clinical Time Series , 2016, MLHC.

[19]  Peter Szolovits,et al.  MIMIC-III, a freely accessible critical care database , 2016, Scientific Data.

[20]  Yoshua Bengio,et al.  A Recurrent Latent Variable Model for Sequential Data , 2015, NIPS.

[21]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[22]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[23]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[24]  B. Wells,et al.  Strategies for Handling Missing Data in Electronic Health Record Derived Data , 2013, EGEMS.

[25]  G. Moody,et al.  Predicting in-hospital mortality of ICU patients: The PhysioNet/Computing in cardiology challenge 2012 , 2012, 2012 Computing in Cardiology.

[26]  Constantine Frangakis,et al.  Multiple imputation by chained equations: what is it and how does it work? , 2011, International journal of methods in psychiatric research.

[27]  Patrick Royston,et al.  Multiple imputation using chained equations: Issues and guidance for practice , 2011, Statistics in medicine.

[28]  YuleiHe Missing Data Analysis Using Multiple Imputation , 2010 .

[29]  Yulei He,et al.  Missing data analysis using multiple imputation: getting to the heart of the matter. , 2010, Circulation. Cardiovascular quality and outcomes.

[30]  Ulpu Remes,et al.  Observation uncertainty measures for sparse imputation , 2010, INTERSPEECH.

[31]  Katherine A. Heller,et al.  Bayesian Exponential Family PCA , 2008, NIPS.

[32]  Edgar Acuña,et al.  The Treatment of Missing Values and its Effect on Classifier Accuracy , 2004 .

[33]  Nicole A. Lazar,et al.  Statistical Analysis With Missing Data , 2003, Technometrics.

[34]  Shin Ishii,et al.  A Bayesian missing value estimation method for gene expression profile data , 2003, Bioinform..

[35]  Russ B. Altman,et al.  Missing value estimation methods for DNA microarrays , 2001, Bioinform..

[36]  Jeffrey M. Hausdorff,et al.  Physionet: Components of a New Research Resource for Complex Physiologic Signals". Circu-lation Vol , 2000 .

[37]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[38]  Huaiyu Zhu On Information and Sufficiency , 1997 .

[39]  G. Kalton,et al.  Handling missing data in survey research , 1996, Statistical methods in medical research.

[40]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .