A Review of Deep Learning Methods for Irregularly Sampled Medical Time Series Data

Irregularly sampled time series (ISTS) data has irregular temporal intervals between observations and different sampling rates between sequences. ISTS commonly appears in healthcare, economics, and geoscience. Especially in the medical environment, the widely used Electronic Health Records (EHRs) have abundant typical irregularly sampled medical time series (ISMTS) data. Developing deep learning methods on EHRs data is critical for personalized treatment, precise diagnosis and medical management. However, it is challenging to directly use deep learning models for ISMTS data. On the one hand, ISMTS data has the intra-series and inter-series relations. Both the local and global structures should be considered. On the other hand, methods should consider the trade-off between task accuracy and model complexity and remain generality and interpretability. So far, many existing works have tried to solve the above problems and have achieved good results. In this paper, we review these deep learning methods from the perspectives of technology and task. Under the technology-driven perspective, we summarize them into two categories - missing data-based methods and raw data-based methods. Under the task-driven perspective, we also summarize them into two categories - data imputation-oriented and downstream task-oriented. For each of them, we point out their advantages and disadvantages. Moreover, we implement some representative methods and compare them on four medical datasets with two tasks. Finally, we discuss the challenges and opportunities in this area.

[1]  Ho-Jin Choi,et al.  Recurrent neural networks with missing information imputation for medical examination data prediction , 2017, 2017 IEEE International Conference on Big Data and Smart Computing (BigComp).

[2]  Iman Deznabi,et al.  Multi-resolution Networks For Flexible Irregular Time Series Modeling (Multi-FIT) , 2019, ArXiv.

[3]  Baoyao Yang,et al.  DATA-GRU: Dual-Attention Time-Aware Gated Recurrent Unit for Irregular Multivariate Time Series , 2020, AAAI.

[4]  Robert Tibshirani,et al.  Spectral Regularization Algorithms for Learning Large Incomplete Matrices , 2010, J. Mach. Learn. Res..

[5]  Fatih Ilhan,et al.  Unsupervised Online Anomaly Detection On Irregularly Sampled Or Missing Valued Time-Series Data Using LSTM Networks , 2020, ArXiv.

[6]  Marinka Zitnik,et al.  Interpretability of machine learning‐based prediction models in healthcare , 2020, WIREs Data Mining Knowl. Discov..

[7]  I. James,et al.  Linear regression with censored data , 1979 .

[8]  Berthold Reinwald,et al.  Forecasting in multivariate irregularly sampled time series with missing values , 2020, ArXiv.

[9]  Inderjit S. Dhillon,et al.  Temporal Regularized Matrix Factorization for High-dimensional Time Series Prediction , 2016, NIPS.

[10]  Fenglong Ma,et al.  Dipole: Diagnosis Prediction in Healthcare via Attention-based Bidirectional Recurrent Neural Networks , 2017, KDD.

[11]  Yang Yang,et al.  Time2Graph: Revisiting Time Series Modeling with Dynamic Shapelets , 2019, AAAI.

[12]  Fei Wang,et al.  Patient Subtyping via Time-Aware LSTM Networks , 2017, KDD.

[13]  Jeffrey M. Hausdorff,et al.  Physionet: Components of a New Research Resource for Complex Physiologic Signals". Circu-lation Vol , 2000 .

[14]  Alexandros G. Dimakis,et al.  AmbientGAN: Generative models from lossy measurements , 2018, ICLR.

[15]  Shamim Nemati,et al.  Early Prediction of Sepsis From Clinical Data: The PhysioNet/Computing in Cardiology Challenge 2019 , 2019, 2019 Computing in Cardiology (CinC).

[16]  Jimeng Sun,et al.  MiME: Multilevel Medical Embedding of Electronic Health Records for Predictive Healthcare , 2018, NeurIPS.

[17]  Franz J. Kir'aly,et al.  Kernels for time series with irregularly-spaced multivariate observations , 2020, ArXiv.

[18]  Guojun Zhang,et al.  A Modified SVM Classifier Based on RS in Medical Disease Prediction , 2009, 2009 Second International Symposium on Computational Intelligence and Design.

[19]  Phil D. Green,et al.  Speech Recognition with Missing Data using Recurrent Neural Nets , 2001, NIPS.

[20]  S. Haykin Kalman Filtering and Neural Networks , 2001 .

[21]  Georgios Kostopoulos,et al.  Iterative Robust Semi-Supervised Missing Data Imputation , 2020, IEEE Access.

[22]  Laila Benhlima,et al.  A Study of Handling Missing Data Methods for Big Data , 2018, 2018 IEEE 5th International Congress on Information Science and Technology (CiSt).

[23]  Adler J. Perotte,et al.  Risk prediction for chronic kidney disease progression using heterogeneous electronic health record data and time series analysis , 2015, J. Am. Medical Informatics Assoc..

[24]  Norman Poh,et al.  Automatic classification of irregularly sampled time series with unequal lengths: A case study on estimated glomerular filtration rate , 2016, 2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP).

[25]  Radu State,et al.  Improving Missing Data Imputation with Deep Generative Models , 2019, ArXiv.

[26]  K. Godfrey,et al.  Simple linear regression in medical research. , 1985, The New England journal of medicine.

[27]  Qian Wang,et al.  MCPL-Based FT-LSTM: Medical Representation Learning-Based Clinical Prediction Model for Time Series Events , 2019, IEEE Access.

[28]  Xianfeng Tang,et al.  Revisiting Spatial-Temporal Similarity: A Deep Learning Framework for Traffic Prediction , 2018, AAAI.

[29]  Isabelle Guyon,et al.  Medical Time-Series Data Generation Using Generative Adversarial Networks , 2020, AIME.

[30]  J. Kurths,et al.  Comparison of correlation analysis techniques for irregularly sampled time series , 2011 .

[31]  William A. Young,et al.  A survey of methodologies for the treatment of missing values within datasets: limitations and benefits , 2011 .

[32]  Gene H. Golub,et al.  Singular value decomposition and least squares solutions , 1970, Milestones in Matrix Computation.

[33]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[34]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[35]  Fabio Stella,et al.  A comparison between discrete and continuous time Bayesian networks in learning from clinical time series data with irregularity , 2019, Artif. Intell. Medicine.

[36]  Ying Zhang,et al.  Multivariate Time Series Imputation with Generative Adversarial Networks , 2018, NeurIPS.

[37]  S. P. Pederson,et al.  Hidden Markov and Other Models for Discrete-Valued Time Series , 1998 .

[38]  Zhengxing Huang,et al.  On Clinical Event Prediction in Patient Treatment Trajectory Using Longitudinal Electronic Health Records , 2019, IEEE Journal of Biomedical and Health Informatics.

[39]  Shamim Nemati,et al.  Early Prediction of Sepsis From Clinical Data: The PhysioNet/Computing in Cardiology Challenge 2019 , 2019, 2019 Computing in Cardiology (CinC).

[40]  G. Moody,et al.  Predicting in-hospital mortality of ICU patients: The PhysioNet/Computing in cardiology challenge 2012 , 2012, 2012 Computing in Cardiology.

[41]  L. Mombaerts,et al.  An interpretable mortality prediction model for COVID-19 patients , 2020, Nature Machine Intelligence.

[42]  Mohammad Khalilia,et al.  Predicting disease risks from highly imbalanced data using random forest , 2011, BMC Medical Informatics Decis. Mak..

[43]  Hongyan Li,et al.  K-margin-based Residual-Convolution-Recurrent Neural Network for Atrial Fibrillation Detection , 2019, IJCAI.

[44]  Charles Elkan,et al.  Learning to Diagnose with LSTM Recurrent Neural Networks , 2015, ICLR.

[45]  Manhua Liu,et al.  RNN-based longitudinal analysis for diagnosis of Alzheimer's disease , 2019, Comput. Medical Imaging Graph..

[46]  P. Young,et al.  Time series analysis, forecasting and control , 1972, IEEE Transactions on Automatic Control.

[47]  Yan Liu,et al.  Recurrent Neural Networks for Multivariate Time Series with Missing Values , 2016, Scientific Reports.

[48]  Shafiq R. Joty,et al.  Adversarial Unsupervised Representation Learning for Activity Time-Series , 2018, AAAI.

[49]  Yongbing Zhang,et al.  An accurate saliency prediction method based on generative adversarial networks , 2017, 2017 IEEE International Conference on Image Processing (ICIP).

[50]  Kamalika Chaudhuri,et al.  Approximation and Convergence Properties of Generative Adversarial Learning , 2017, NIPS.

[51]  Toktam Khatibi,et al.  An intelligent warning model for early prediction of cardiac arrest in sepsis patients , 2019, Comput. Methods Programs Biomed..

[52]  Hee Chan Kim,et al.  Bidirectional Recurrent Auto-Encoder for Photoplethysmogram Denoising , 2019, IEEE Journal of Biomedical and Health Informatics.

[53]  Ronald J. Williams,et al.  A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.

[54]  Enmei Tu,et al.  Stable and improved generative adversarial nets (GANS): A constructive survey , 2017, 2017 IEEE International Conference on Image Processing (ICIP).

[55]  Gautier Marti,et al.  Autoregressive Convolutional Neural Networks for Asynchronous Time Series , 2017, ICML.

[56]  Xianghua Xie,et al.  Clustering and Classification for Time Series Data in Visual Analytics: A Survey , 2019, IEEE Access.

[57]  Katherine A. Heller,et al.  Learning to Detect Sepsis with a Multitask Gaussian Process RNN Classifier , 2017, ICML.

[58]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[59]  Mohammad Taha Bahadori,et al.  Temporal-Clustering Invariance in Irregular Healthcare Time Series , 2019, ArXiv.

[60]  David M Kreindler,et al.  The effects of the irregular sample and missing data in time series analysis. , 2006, Nonlinear dynamics, psychology, and life sciences.

[61]  Jimeng Sun,et al.  RETAIN: An Interpretable Predictive Model for Healthcare using Reverse Time Attention Mechanism , 2016, NIPS.

[62]  Pong C. Yuen,et al.  UA-CRNN: Uncertainty-Aware Convolutional Recurrent Neural Network for Mortality Risk Prediction , 2019, CIKM.

[63]  Benjamin M. Marlin,et al.  Learning from Irregularly-Sampled Time Series: A Missing Data Perspective , 2020, ICML.

[64]  Ming-Hsuan Yang,et al.  Generative Face Completion , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[65]  Lantao Yu,et al.  SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient , 2016, AAAI.

[66]  Kai Yang,et al.  TimeAutoML: Autonomous Representation Learning for Multivariate Irregularly Sampled Time Series , 2020, ArXiv.

[67]  Volker Tresp,et al.  A Solution for Missing Data in Recurrent Neural Networks with an Application to Blood Glucose Prediction , 1997, NIPS.

[68]  D. Rubin Multiple imputation for nonresponse in surveys , 1989 .

[69]  A. Walden,et al.  Wavelet Methods for Time Series Analysis , 2000 .

[70]  Mihaela van der Schaar,et al.  GAIN: Missing Data Imputation using Generative Adversarial Nets , 2018, ICML.

[71]  Stephan M. Jakob,et al.  Arterial blood pressure during early sepsis and outcome , 2009, Intensive Care Medicine.

[72]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[73]  Jürgen Schmidhuber,et al.  Framewise phoneme classification with bidirectional LSTM and other neural network architectures , 2005, Neural Networks.

[74]  Ihab F. Ilyas,et al.  Data Cleaning: Overview and Emerging Challenges , 2016, SIGMOD Conference.

[75]  Aníbal R. Figueiras-Vidal,et al.  Pattern classification with missing data: a review , 2010, Neural Computing and Applications.

[76]  Hongyu Chen,et al.  A Distributed Descriptor Characterizing Structural Irregularity of EEG Time Series for Epileptic Seizure Detection , 2018, 2018 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC).

[77]  B. Wells,et al.  Strategies for Handling Missing Data in Electronic Health Record Derived Data , 2013, EGEMS.

[78]  Bo Jiang,et al.  MisGAN: Learning from Incomplete Data with Generative Adversarial Networks , 2019, ICLR.

[79]  Mohammad-Reza Siadat,et al.  Analysis of incomplete and inconsistent clinical survey data , 2015, Knowledge and Information Systems.

[80]  Peter Szolovits,et al.  MIMIC-III, a freely accessible critical care database , 2016, Scientific Data.

[81]  Patrick Royston,et al.  Multiple imputation using chained equations: Issues and guidance for practice , 2011, Statistics in medicine.

[82]  G. Escobar,et al.  Hospital deaths in patients with sepsis from 2 independent cohorts. , 2014, JAMA.

[83]  Satya Narayan Shukla,et al.  Interpolation-Prediction Networks for Irregularly Sampled Time Series , 2019, ICLR.

[84]  Satya Narayan Shukla,et al.  Modeling Irregularly Sampled Clinical Time Series , 2018, ArXiv.

[85]  Taghi M. Khoshgoftaar,et al.  Survey of Clinical Data Mining Applications on Big Data in Health Informatics , 2013, 2013 12th International Conference on Machine Learning and Applications.

[86]  Walter F. Stewart,et al.  Doctor AI: Predicting Clinical Events via Recurrent Neural Networks , 2015, MLHC.

[87]  Jimeng Sun,et al.  Opportunities and Challenges in Deep Learning Methods on Electrocardiogram Data: A Systematic Review , 2020, ArXiv.

[88]  David C. Kale,et al.  Directly Modeling Missing Data in Sequences with RNNs: Improved Classification of Clinical Time Series , 2016, MLHC.

[89]  Jinsung Yoon,et al.  Estimating Missing Data in Temporal Data Streams Using Multi-Directional Recurrent Neural Networks , 2017, IEEE Transactions on Biomedical Engineering.

[90]  Lena Osterhagen,et al.  Multiple Imputation For Nonresponse In Surveys , 2016 .

[91]  Harichandran Khanna Nehemiah,et al.  A bio-statistical mining approach for classifying multivariate clinical time series data observed at irregular intervals , 2017, Expert Syst. Appl..

[92]  Bo Zong,et al.  Tensorized LSTM with Adaptive Shared Memory for Learning Trends in Multivariate Time Series , 2020, AAAI.

[93]  Dit-Yan Yeung,et al.  Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting , 2015, NIPS.

[94]  Parisa Rashidi,et al.  Deep EHR: A Survey of Recent Advances in Deep Learning Techniques for Electronic Health Record (EHR) Analysis , 2017, IEEE Journal of Biomedical and Health Informatics.

[95]  Gang Pan,et al.  Estimating Brain Connectivity With Varying-Length Time Lags Using a Recurrent Neural Network , 2018, IEEE Transactions on Biomedical Engineering.

[96]  Xianfeng Tang,et al.  Joint Modeling of Local and Global Temporal Dynamics for Multivariate Time Series Forecasting with Missing Values , 2019, AAAI.

[97]  Jimeng Sun,et al.  Multi-layer Representation Learning for Medical Concepts , 2016, KDD.

[98]  Hui Jing,et al.  A deep learning method based on hybrid auto-encoder model , 2017, 2017 IEEE 2nd Information Technology, Networking, Electronic and Automation Control Conference (ITNEC).

[99]  Jimeng Sun,et al.  HOLMES: Health OnLine Model Ensemble Serving for Deep Learning Models in Intensive Care Units , 2020, KDD.

[100]  Jimeng Sun,et al.  RAIM: Recurrent Attentive and Intensive Model of Multimodal Patient Monitoring Data , 2018, KDD.

[101]  Wang Wei An Efficient Nearest Neighbor Classifier Algorithm Based on Pre-classify , 2007 .

[102]  Wei Cao,et al.  BRITS: Bidirectional Recurrent Imputation for Time Series , 2018, NeurIPS.

[103]  Min Chi,et al.  Temporal Belief Memory: Imputing Missing Data during RNN Training , 2018, IJCAI.

[104]  Jaeyoon Kim,et al.  A Survey of Missing Data Imputation Using Generative Adversarial Networks , 2020, 2020 International Conference on Artificial Intelligence in Information and Communication (ICAIIC).

[105]  Yan Liu,et al.  Deep Computational Phenotyping , 2015, KDD.

[106]  Yan Liu,et al.  Hierarchical Deep Generative Models for Multi-Rate Multivariate Time Series , 2018, ICML.

[107]  Benjamin M. Marlin,et al.  A scalable end-to-end Gaussian process adapter for irregularly sampled time series classification , 2016, NIPS.

[108]  Nitesh V. Chawla,et al.  RESTFul: Resolution-Aware Forecasting of Behavioral Time Series Data , 2018, CIKM.

[109]  Pong C. Yuen,et al.  A Hybrid Residual Network and Long Short-Term Memory Method for Peptic Ulcer Bleeding Mortality Prediction , 2018, AMIA.