相关论文

Multivariate Time Series Imputation with Generative Adversarial Networks

Abstract:Multivariate time series usually contain a large number of missing values, which hinders the application of advanced analysis methods on multivariate time series data. Conventional approaches to addressing the challenge of missing values, including mean/zero imputation, case deletion, and matrix factorization-based imputation, are all incapable of modeling the temporal dependencies and the nature of complex distribution in multivariate time series. In this paper, we treat the problem of missing value imputation as data generation. Inspired by the success of Generative Adversarial Networks (GAN) in image generation, we propose to learn the overall distribution of a multivariate time series dataset with GAN, which is further used to generate the missing values for each sample. Different from the image data, the time series data are usually incomplete due to the nature of data recording process. A modified Gate Recurrent Unit is employed in GAN to model the temporal irregularity of the incomplete time series. Experiments on two multivariate time series datasets show that the proposed model outperformed the baselines in terms of accuracy of imputation. Experimental results also showed that a simple model on the imputed data can achieve state-of-the-art results on the prediction tasks, demonstrating the benefits of our model in downstream applications.

参考文献

[1]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[2]  W. Wothke Longitudinal and multigroup modeling with missing data. , 2000 .

[3]  Gustavo E. A. P. A. Batista,et al.  An analysis of four missing data treatment methods for supervised learning , 2003, Appl. Artif. Intell..

[4]  D. Edwards Data Mining: Concepts, Models, Methods, and Algorithms , 2003 .

[5]  Edgar Acuña,et al.  The Treatment of Missing Values and its Effect on Classifier Accuracy , 2004 .

[6]  Tshilidzi Marwala,et al.  Missing data: A comparison of neural network and expectation maximization techniques , 2007 .

[7]  Patrick E. McKnight Missing Data: A Gentle Introduction , 2007 .

[8]  R. Perera Research methods journal club: a gentle introduction to imputation of missing values , 2008, Evidence-based medicine.

[9]  Aníbal R. Figueiras-Vidal,et al.  Pattern classification with missing data: a review , 2010, Neural Computing and Applications.

[10]  J. Graham,et al.  Missing data analysis: making it work in the real world. , 2009, Annual review of psychology.

[11]  Leslie S. Smith,et al.  A neural network-based framework for the reconstruction of incomplete data sets , 2010, Neurocomputing.

[12]  Robert Tibshirani,et al.  Spectral Regularization Algorithms for Learning Large Incomplete Matrices , 2010, J. Mach. Learn. Res..

[13]  Wei-Chang Yeh,et al.  Forecasting stock markets using wavelet transforms and recurrent neural networks: An integrated system based on artificial bee colony algorithm , 2011, Appl. Soft Comput..

[14]  G. Moody,et al.  Predicting in-hospital mortality of ICU patients: The PhysioNet/Computing in cardiology challenge 2012 , 2012, 2012 Computing in Cardiology.

[15]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[16]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[17]  Simon Osindero,et al.  Conditional Generative Adversarial Nets , 2014, ArXiv.

[18]  Luis E. Zárate,et al.  A brief review of the main approaches for treatment of missing data , 2014, Intell. Data Anal..

[19]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[20]  Jehanzeb R. Cheema A Review of Missing Data Handling Methods in Education Research , 2014 .

[21]  Gari D. Clifford,et al.  Data preprocessing and mortality prediction: The Physionet/CinC 2012 challenge revisited , 2014, Computing in Cardiology 2014.

[22]  Jiri Kaiser,et al.  Dealing with Missing Values in Data , 2014 .

[23]  Trevor J. Hastie,et al.  Matrix completion and low-rank SVD via fast alternating least squares , 2014, J. Mach. Learn. Res..

[24]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[25]  Pedro Abreu,et al.  Missing data imputation on the 5-year survival prediction of breast cancer patients with unknown discrete values , 2015, Comput. Biol. Medicine.

[26]  Pieter Abbeel,et al.  InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets , 2016, NIPS.

[27]  Mehran Amiri,et al.  Missing data imputation using fuzzy-rough methods , 2016, Neurocomputing.

[28]  J. Zico Kolter,et al.  Gradient descent GAN optimization is locally stable , 2017, NIPS.

[29]  Sepp Hochreiter,et al.  GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[30]  Lantao Yu,et al.  SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient , 2016, AAAI.

[31]  Zhenan Sun,et al.  Recent Progress of Face Image Synthesis , 2017, 2017 4th IAPR Asian Conference on Pattern Recognition (ACPR).

[32]  Yoshua Bengio,et al.  Maximum-Likelihood Augmented Discrete Generative Adversarial Networks , 2017, ArXiv.

[33]  Pinjia He,et al.  Semantically Consistent Image Completion with Fine-grained Details , 2017, ArXiv.

[34]  John E. Hopcroft,et al.  Stacked Generative Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Léon Bottou,et al.  Wasserstein Generative Adversarial Networks , 2017, ICML.

[36]  Ming-Hsuan Yang,et al.  Generative Face Completion , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Sandeep Subramanian,et al.  Adversarial Generation of Natural Language , 2017, Rep4NLP@ACL.

[38]  Eric Horvitz,et al.  Predicting Mortality of Intensive Care Patients via Learning about Hazard , 2017, AAAI.

[39]  Beng Chin Ooi,et al.  Resolving the Bias in Electronic Medical Records , 2017, KDD.

[40]  Christian Ledig,et al.  Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Kamalika Chaudhuri,et al.  Approximation and Convergence Properties of Generative Adversarial Learning , 2017, NIPS.

[42]  Georg Langs,et al.  Unsupervised Anomaly Detection with Generative Adversarial Networks to Guide Marker Discovery , 2017, IPMI.

[43]  Andrew M. Dai,et al.  MaskGAN: Better Text Generation via Filling in the ______ , 2018, ICLR.

[44]  Mihaela van der Schaar,et al.  GAIN: Missing Data Imputation using Generative Adversarial Nets , 2018, ICML.

[45]  Alexandros G. Dimakis,et al.  AmbientGAN: Generative models from lossy measurements , 2018, ICLR.

[46]  Yan Liu,et al.  Recurrent Neural Networks for Multivariate Time Series with Missing Values , 2016, Scientific Reports.

引用
Time Series Anomaly Detection for Smart Grids: A Survey
2021 IEEE Electrical Power and Energy Conference (EPEC)
2021
GAMIN: Generative Adversarial Multiple Imputation Network for Highly Missing Data
2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
2020
Tomographic Auto-Encoder: Unsupervised Bayesian Recovery of Corrupted Data
ICLR
2020
Imputing Missing Observations with Time Sliced Synthetic Minority Oversampling Technique
ArXiv
2022
Data-driven Reconstruction of Nonlinear Dynamics from Sparse Observation
J. Comput. Phys.
2019
One-dimensional Deep Image Prior for Time Series Inverse Problems
2022 56th Asilomar Conference on Signals, Systems, and Computers
2019
A Two-Block RNN-Based Trajectory Prediction From Incomplete Trajectory
IEEE Access
2022
Imitative Non-Autoregressive Modeling for Trajectory Forecasting and Imputation
2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
2020
Clairvoyance: A Pipeline Toolkit for Medical Time Series
ICLR
2023
Learning Representations for Incomplete Time Series Clustering
AAAI
2021
Latent ODEs for Irregularly-Sampled Time Series
ArXiv
2019
Recursive Input and State Estimation: a General Framework for Learning from Time Series With Missing Data
ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
2021
NRTSI: Non-Recurrent Time Series Imputation for Irregularly-sampled Data
ArXiv
2021
Towards Generating Real-World Time Series Data
2021 IEEE International Conference on Data Mining (ICDM)
2021
Time Series Data Imputation: A Survey on Deep Learning Approaches
ArXiv
2020
Mobile communication base station traffic forecast
2021
A Model-Agnostic Method for PMU Data Recovery Using Optimal Singular Value Thresholding
IEEE Transactions on Power Delivery
2021
Iterative Imputation of Missing Data Using Auto-Encoder Dynamics
ICONIP
2020
Can auto-encoders help with filling missing data?
ICLR 2020
2020
Entity Matching from Unstructured and Dissimilar Data Collections: Semantic and Content Distribution Approach
2020