ScoreGrad: Multivariate Probabilistic Time Series Forecasting with Continuous Energy-based Generative Models

Multivariate time series prediction has attracted a lot of attention because of its wide applications such as intelligence transportation, AIOps. Generative models have achieved impressive results in time series modeling because they can model data distribution and take noise into consideration. However, many existing works can not be widely used because of the constraints of functional form of generative models or the sensitivity to hyperparameters. In this paper, we propose ScoreGrad, a multivariate probabilistic time series forecasting framework based on continuous energy-based generative models. ScoreGrad is composed of time series feature extraction module and conditional stochastic differential equation based score matching module. The prediction can be achieved by iteratively solving reverse-time SDE. To the best of our knowledge, ScoreGrad is the first continuous energy based generative model used for time series forecasting. Furthermore, ScoreGrad achieves state-of-theart results on six real-world datasets. The impact of hyperparameters and sampler types on the performance are also explored. Code is available at https://github.com/yantijin/ScoreGradPred.

[1]  Rob J Hyndman,et al.  Forecasting with Exponential Smoothing: The State Space Approach , 2008 .

[2]  Alexander Jordan,et al.  Evaluating Probabilistic Forecasts with scoringRules , 2017, Journal of Statistical Software.

[3]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[4]  Surya Ganguli,et al.  Deep Unsupervised Learning using Nonequilibrium Thermodynamics , 2015, ICML.

[5]  Stefano Ermon,et al.  Improved Techniques for Training Score-Based Generative Models , 2020, NeurIPS.

[6]  Yang Song,et al.  Generative Modeling by Estimating Gradients of the Data Distribution , 2019, NeurIPS.

[7]  Jiaming Song,et al.  Denoising Diffusion Implicit Models , 2021, ICLR.

[8]  Garrison W. Cottrell,et al.  A Dual-Stage Attention-Based Recurrent Neural Network for Time Series Prediction , 2017, IJCAI.

[9]  Arthur Gretton,et al.  Learning deep kernels for exponential family densities , 2018, ICML.

[10]  Pascal Vincent,et al.  A Connection Between Score Matching and Denoising Autoencoders , 2011, Neural Computation.

[11]  Myle Ott,et al.  Residual Energy-Based Models for Text Generation , 2020, ICLR.

[12]  Ole Winther,et al.  A Disentangled Recognition and Nonlinear Dynamics Model for Unsupervised Learning , 2017, NIPS.

[13]  John F. MacGregor,et al.  Some Recent Advances in Forecasting and Control , 1968 .

[14]  Hung-yi Lee,et al.  Temporal pattern attention for multivariate time series forecasting , 2018, Machine Learning.

[15]  Yin Tat Lee,et al.  The Randomized Midpoint Method for Log-Concave Sampling , 2019, NeurIPS.

[16]  L. Younes On the convergence of markovian stochastic algorithms with rapidly decreasing ergodicity rates , 1999 .

[17]  S. Duane,et al.  Hybrid Monte Carlo , 1987 .

[18]  Ingmar Schuster,et al.  Autoregressive Denoising Diffusion Models for Multivariate Probabilistic Time Series Forecasting , 2021, ICML.

[19]  Pieter Abbeel,et al.  Denoising Diffusion Probabilistic Models , 2020, NeurIPS.

[20]  Jonathan T. Barron,et al.  Fourier Features Let Networks Learn High Frequency Functions in Low Dimensional Domains , 2020, NeurIPS.

[21]  Guokun Lai,et al.  Modeling Long- and Short-Term Temporal Patterns with Deep Neural Networks , 2017, SIGIR.

[22]  B. Anderson Reverse-time diffusion equation models , 1982 .

[23]  R. L. Winkler,et al.  Scoring Rules for Continuous Probability Distributions , 1976 .

[24]  Michael Bohlke-Schneider,et al.  High-Dimensional Multivariate Forecasting with Low-Rank Gaussian Copula Processes , 2019, NeurIPS.

[25]  Syama Sundar Rangapuram,et al.  GluonTS: Probabilistic Time Series Models in Python , 2019, ArXiv.

[26]  Shakir Mohamed,et al.  Variational Inference with Normalizing Flows , 2015, ICML.

[27]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[28]  Thomas S. Huang,et al.  Dilated Recurrent Neural Networks , 2017, NIPS.

[29]  Aapo Hyv Estimation of Non-Normalized Statistical Models by Score Matching , 2005 .

[30]  K. Torkkola,et al.  A Multi-Horizon Quantile Recurrent Forecaster , 2017, 1711.11053.

[31]  Aapo Hyvärinen,et al.  Noise-contrastive estimation: A new estimation principle for unnormalized statistical models , 2010, AISTATS.

[32]  Valentin Flunkert,et al.  DeepAR: Probabilistic Forecasting with Autoregressive Recurrent Networks , 2017, International Journal of Forecasting.

[33]  Diederik P. Kingma,et al.  How to Train Your Energy-Based Models , 2021, ArXiv.

[34]  Ingmar Schuster,et al.  Multi-variate Probabilistic Time Series Forecasting via Conditioned Normalizing Flows , 2020, ICLR.

[35]  Wei Sun,et al.  Robust Anomaly Detection for Multivariate Time Series through Stochastic Recurrent Neural Network , 2019, KDD.

[36]  Eleni I. Vlahogianni,et al.  Road Traffic Forecasting: Recent Advances and New Challenges , 2018, IEEE Intelligent Transportation Systems Magazine.

[37]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[38]  Jungwon Lee,et al.  Residual LSTM: Design of a Deep Recurrent Architecture for Distant Speech Recognition , 2017, INTERSPEECH.

[39]  Wei Ping,et al.  DiffWave: A Versatile Diffusion Model for Audio Synthesis , 2020, ICLR.

[40]  Iain Murray,et al.  Masked Autoregressive Flow for Density Estimation , 2017, NIPS.

[41]  Heiga Zen,et al.  WaveNet: A Generative Model for Raw Audio , 2016, SSW.

[42]  Mihaela van der Schaar,et al.  Time-series Generative Adversarial Networks , 2019, NeurIPS.

[43]  Roy van der Weide,et al.  GO-GARCH: a multivariate generalized orthogonal GARCH model , 2002 .

[44]  Helmut Ltkepohl,et al.  New Introduction to Multiple Time Series Analysis , 2007 .

[45]  Yoshua Bengio,et al.  NICE: Non-linear Independent Components Estimation , 2014, ICLR.

[46]  Aapo Hyvärinen,et al.  Estimation of Non-Normalized Statistical Models by Score Matching , 2005, J. Mach. Learn. Res..

[47]  Yang Song,et al.  Sliced Score Matching: A Scalable Approach to Density and Score Estimation , 2019, UAI.

[48]  Patrick Gallinari,et al.  Normalizing Kalman Filters for Multivariate Time Series Analysis , 2020, NeurIPS.

[49]  Tim Januschowski,et al.  Deep Factors for Forecasting , 2019, ICML.

[50]  Abhishek Kumar,et al.  Score-Based Generative Modeling through Stochastic Differential Equations , 2020, ICLR.