论文信息 - Multivariate Time Series Imputation with Generative Adversarial Networks

SW1PerS: Sliding windows and 1-persistence scoring; discovering periodicity in gene expression time series data

BackgroundIdentifying periodically expressed genes across different processes (e.g. the cell and metabolic cycles, circadian rhythms, etc) is a central problem in computational biology. Biological time series may contain (multiple) unknown signal shapes of systemic relevance, imperfections like noise, damping, and trending, or limited sampling density. While there exist methods for detecting periodicity, their design biases (e.g. toward a specific signal shape) can limit their applicability in one or more of these situations.MethodsWe present in this paper a novel method, SW1PerS, for quantifying periodicity in time series in a shape-agnostic manner and with resistance to damping. The measurement is performed directly, without presupposing a particular pattern, by evaluating the circularity of a high-dimensional representation of the signal. SW1PerS is compared to other algorithms using synthetic data and performance is quantified under varying noise models, noise levels, sampling densities, and signal shapes. Results on biological data are also analyzed and compared.ResultsOn the task of periodic/not-periodic classification, using synthetic data, SW1PerS outperforms all other algorithms in the low-noise regime. SW1PerS is shown to be the most shape-agnostic of the evaluated methods, and the only one to consistently classify damped signals as highly periodic. On biological data, and for several experiments, the lists of top 10% genes ranked with SW1PerS recover up to 67% of those generated with other popular algorithms. Moreover, the list of genes from data on the Yeast metabolic cycle which are highly-ranked only by SW1PerS, contains evidently non-cosine patterns (e.g. ECM33, CDC9, SAM1,2 and MSH6) with highly periodic expression profiles. In data from the Yeast cell cycle SW1PerS identifies genes not preferred by other algorithms, hence not previously reported as periodic, but found in other experiments such as the universal growth rate response of Slavov. These genes are BOP3, CDC10, YIL108W, YER034W, MLP1, PAC2 and RTT101.ConclusionsIn biological systems with low noise, i.e. where periodic signals with interesting shapes are more likely to occur, SW1PerS can be used as a powerful tool in exploratory analyses. Indeed, by having an initial set of periodic genes with a rich variety of signal types, pattern/shape information can be included in the study of systems and the generation of hypotheses regarding the structure of gene regulatory networks.

Multivariate Time Series Imputation with Generative Adversarial Networks

Ying Zhang

Xiaojie Yuan

Jun Xu

Yonghong Luo

Xiangrui Cai

Y. Zhang

Jun Xu

Xiaojie Yuan

Yonghong Luo

Xiangrui Cai

Abstract:Multivariate time series usually contain a large number of missing values, which hinders the application of advanced analysis methods on multivariate time series data. Conventional approaches to addressing the challenge of missing values, including mean/zero imputation, case deletion, and matrix factorization-based imputation, are all incapable of modeling the temporal dependencies and the nature of complex distribution in multivariate time series. In this paper, we treat the problem of missing value imputation as data generation. Inspired by the success of Generative Adversarial Networks (GAN) in image generation, we propose to learn the overall distribution of a multivariate time series dataset with GAN, which is further used to generate the missing values for each sample. Different from the image data, the time series data are usually incomplete due to the nature of data recording process. A modified Gate Recurrent Unit is employed in GAN to model the temporal irregularity of the incomplete time series. Experiments on two multivariate time series datasets show that the proposed model outperformed the baselines in terms of accuracy of imputation. Experimental results also showed that a simple model on the imputed data can achieve state-of-the-art results on the prediction tasks, demonstrating the benefits of our model in downstream applications.

参考文献

[1] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[2] W. Wothke. Longitudinal and multigroup modeling with missing data. , 2000 .

[3] Gustavo E. A. P. A. Batista,et al. An analysis of four missing data treatment methods for supervised learning , 2003, Appl. Artif. Intell..

[4] D. Edwards. Data Mining: Concepts, Models, Methods, and Algorithms , 2003 .

[5] Edgar Acuña,et al. The Treatment of Missing Values and its Effect on Classifier Accuracy , 2004 .

[6] Tshilidzi Marwala,et al. Missing data: A comparison of neural network and expectation maximization techniques , 2007 .

[7] Patrick E. McKnight. Missing Data: A Gentle Introduction , 2007 .

[8] R. Perera. Research methods journal club: a gentle introduction to imputation of missing values , 2008, Evidence-based medicine.

[9] Aníbal R. Figueiras-Vidal,et al. Pattern classification with missing data: a review , 2010, Neural Computing and Applications.

[10] J. Graham,et al. Missing data analysis: making it work in the real world. , 2009, Annual review of psychology.

[11] Leslie S. Smith,et al. A neural network-based framework for the reconstruction of incomplete data sets , 2010, Neurocomputing.

[12] Robert Tibshirani,et al. Spectral Regularization Algorithms for Learning Large Incomplete Matrices , 2010, J. Mach. Learn. Res..

[13] Wei-Chang Yeh,et al. Forecasting stock markets using wavelet transforms and recurrent neural networks: An integrated system based on artificial bee colony algorithm , 2011, Appl. Soft Comput..

[14] G. Moody,et al. Predicting in-hospital mortality of ICU patients: The PhysioNet/Computing in cardiology challenge 2012 , 2012, 2012 Computing in Cardiology.

[15] Yoshua Bengio,et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[16] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[17] Simon Osindero,et al. Conditional Generative Adversarial Nets , 2014, ArXiv.

[18] Luis E. Zárate,et al. A brief review of the main approaches for treatment of missing data , 2014, Intell. Data Anal..

[19] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.

[20] Jehanzeb R. Cheema. A Review of Missing Data Handling Methods in Education Research , 2014 .

[21] Gari D. Clifford,et al. Data preprocessing and mortality prediction: The Physionet/CinC 2012 challenge revisited , 2014, Computing in Cardiology 2014.

[22] Jiri Kaiser,et al. Dealing with Missing Values in Data , 2014 .

[23] Trevor J. Hastie,et al. Matrix completion and low-rank SVD via fast alternating least squares , 2014, J. Mach. Learn. Res..

[24] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[25] Pedro Abreu,et al. Missing data imputation on the 5-year survival prediction of breast cancer patients with unknown discrete values , 2015, Comput. Biol. Medicine.

[26] Pieter Abbeel,et al. InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets , 2016, NIPS.

[27] Mehran Amiri,et al. Missing data imputation using fuzzy-rough methods , 2016, Neurocomputing.

[28] J. Zico Kolter,et al. Gradient descent GAN optimization is locally stable , 2017, NIPS.

[29] Sepp Hochreiter,et al. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[30] Lantao Yu,et al. SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient , 2016, AAAI.

[31] Zhenan Sun,et al. Recent Progress of Face Image Synthesis , 2017, 2017 4th IAPR Asian Conference on Pattern Recognition (ACPR).

[32] Yoshua Bengio,et al. Maximum-Likelihood Augmented Discrete Generative Adversarial Networks , 2017, ArXiv.

[33] Pinjia He,et al. Semantically Consistent Image Completion with Fine-grained Details , 2017, ArXiv.

[34] John E. Hopcroft,et al. Stacked Generative Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35] Léon Bottou,et al. Wasserstein Generative Adversarial Networks , 2017, ICML.

[36] Ming-Hsuan Yang,et al. Generative Face Completion , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37] Sandeep Subramanian,et al. Adversarial Generation of Natural Language , 2017, Rep4NLP@ACL.

[38] Eric Horvitz,et al. Predicting Mortality of Intensive Care Patients via Learning about Hazard , 2017, AAAI.

[39] Beng Chin Ooi,et al. Resolving the Bias in Electronic Medical Records , 2017, KDD.

[40] Christian Ledig,et al. Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41] Kamalika Chaudhuri,et al. Approximation and Convergence Properties of Generative Adversarial Learning , 2017, NIPS.

[42] Georg Langs,et al. Unsupervised Anomaly Detection with Generative Adversarial Networks to Guide Marker Discovery , 2017, IPMI.

[43] Andrew M. Dai,et al. MaskGAN: Better Text Generation via Filling in the ______ , 2018, ICLR.

[44] Mihaela van der Schaar,et al. GAIN: Missing Data Imputation using Generative Adversarial Nets , 2018, ICML.

[45] Alexandros G. Dimakis,et al. AmbientGAN: Generative models from lossy measurements , 2018, ICLR.

[46] Yan Liu,et al. Recurrent Neural Networks for Multivariate Time Series with Missing Values , 2016, Scientific Reports.

引用

Time Series Anomaly Detection for Smart Grids: A Survey

2021 IEEE Electrical Power and Energy Conference (EPEC)

2021

GAMIN: Generative Adversarial Multiple Imputation Network for Highly Missing Data

2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

2020

Tomographic Auto-Encoder: Unsupervised Bayesian Recovery of Corrupted Data

ICLR

2020

Imputing Missing Observations with Time Sliced Synthetic Minority Oversampling Technique

ArXiv

2022

Recursive Input and State Estimation: a General Framework for Learning from Time Series With Missing Data

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

2021

NRTSI: Non-Recurrent Time Series Imputation for Irregularly-sampled Data

ArXiv

2021

Towards Generating Real-World Time Series Data

2021 IEEE International Conference on Data Mining (ICDM)

2021

Time Series Data Imputation: A Survey on Deep Learning Approaches

ArXiv

2020

Mobile communication base station traffic forecast

2021

A Model-Agnostic Method for PMU Data Recovery Using Optimal Singular Value Thresholding

IEEE Transactions on Power Delivery

2021

Iterative Imputation of Missing Data Using Auto-Encoder Dynamics

ICONIP

2020

Can auto-encoders help with filling missing data?

ICLR 2020

2020

Entity Matching from Unstructured and Dissimilar Data Collections: Semantic and Content Distribution Approach

2020

Multivariate Time Series Imputation with Generative Adversarial Networks

Time Series Anomaly Detection for Smart Grids: A Survey

GAMIN: Generative Adversarial Multiple Imputation Network for Highly Missing Data

Tomographic Auto-Encoder: Unsupervised Bayesian Recovery of Corrupted Data

Imputing Missing Observations with Time Sliced Synthetic Minority Oversampling Technique

Data-driven Reconstruction of Nonlinear Dynamics from Sparse Observation

One-dimensional Deep Image Prior for Time Series Inverse Problems

A Two-Block RNN-Based Trajectory Prediction From Incomplete Trajectory

Imitative Non-Autoregressive Modeling for Trajectory Forecasting and Imputation

Clairvoyance: A Pipeline Toolkit for Medical Time Series

Learning Representations for Incomplete Time Series Clustering

Latent ODEs for Irregularly-Sampled Time Series

Recursive Input and State Estimation: a General Framework for Learning from Time Series With Missing Data

NRTSI: Non-Recurrent Time Series Imputation for Irregularly-sampled Data

Towards Generating Real-World Time Series Data

Time Series Data Imputation: A Survey on Deep Learning Approaches

Mobile communication base station traffic forecast

A Model-Agnostic Method for PMU Data Recovery Using Optimal Singular Value Thresholding

Iterative Imputation of Missing Data Using Auto-Encoder Dynamics

Can auto-encoders help with filling missing data?

Entity Matching from Unstructured and Dissimilar Data Collections: Semantic and Content Distribution Approach