Foundations of Sequence-to-Sequence Modeling for Time Series

The availability of large amounts of time series data, paired with the performance of deep-learning algorithms on a broad class of problems, has recently led to significant interest in the use of sequence-to-sequence models for time series forecasting. We provide the first theoretical analysis of this time series forecasting framework. We include a comparison of sequence-to-sequence modeling to classical time series models, and as such our theory can serve as a quantitative guide for practitioners choosing between different modeling methodologies.

[1]  R. Engle Autoregressive conditional heteroscedasticity with estimates of the variance of United Kingdom inflation , 1982 .

[2]  Yisong Yue,et al.  Long-term Forecasting using Tensor-Train RNNs , 2017, ArXiv.

[3]  V. Koltchinskii,et al.  Empirical margin distributions and bounding the generalization error of combined classifiers , 2002, math/0405343.

[4]  J. Yosinski,et al.  Time-series Extreme Event Forecasting with Neural Networks at Uber , 2017 .

[5]  Scott McQuade,et al.  Global Climate Model Tracking Using Geospatial Neighborhoods , 2012, AAAI.

[6]  Mehryar Mohri,et al.  Discriminative State Space Models , 2017, NIPS.

[7]  Juan Pardo,et al.  Time-Series Forecasting of Indoor Temperature Using Pre-trained Deep Neural Networks , 2013, ICANN.

[8]  Mehryar Mohri,et al.  Rademacher Complexity Bounds for Non-I.I.D. Processes , 2008, NIPS.

[9]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[10]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[11]  Helmut Luetkepohl,et al.  Forecasting with VARMA Models , 2004 .

[12]  Mehryar Mohri,et al.  Time series prediction and online learning , 2016, COLT.

[13]  Rob J. Hyndman,et al.  Coherent Probabilistic Forecasts for Hierarchical Time Series , 2017, ICML.

[14]  Valentin Flunkert,et al.  DeepAR: Probabilistic Forecasting with Autoregressive Recurrent Networks , 2017, International Journal of Forecasting.

[15]  Ron Meir,et al.  Nonparametric Time Series Prediction Through Adaptive Model Selection , 2000, Machine Learning.

[16]  Helmut Ltkepohl,et al.  New Introduction to Multiple Time Series Analysis , 2007 .

[17]  G. Michailidis,et al.  Regularized estimation in sparse high-dimensional time series models , 2013, 1311.4175.

[18]  P. Doukhan Mixing: Properties and Examples , 1994 .

[19]  Richard A. Davis,et al.  Time Series: Theory and Methods , 2013 .

[20]  Arindam Banerjee,et al.  R2N2: Residual Recurrent Neural Networks for Multivariate Time Series Forecasting , 2017, ArXiv.

[21]  Ambuj Tewari,et al.  Sequential complexities and uniform martingale laws of large numbers , 2015 .

[22]  P. Bickel,et al.  Large Vector Auto Regressions , 2011, 1106.3915.

[23]  Rob J. Hyndman,et al.  Forecasting hierarchical and grouped time series through trace minimization , 2015 .

[24]  Martin J. Wainwright,et al.  Estimation of (near) low-rank matrices with noise and high-dimensional scaling , 2009, ICML.

[25]  Robert Jenssen,et al.  Recurrent Neural Networks for Short-Term Load Forecasting , 2017, SpringerBriefs in Computer Science.

[26]  Bin Yu RATES OF CONVERGENCE FOR EMPIRICAL PROCESSES OF STATIONARY MIXING SEQUENCES , 1994 .

[27]  D. Giannone,et al.  Large Bayesian vector auto regressions , 2010 .

[28]  Fang Han,et al.  A direct estimation of high dimensional stationary vector autoregressions , 2013, J. Mach. Learn. Res..

[29]  Ryota Tomioka,et al.  Norm-Based Capacity Control in Neural Networks , 2015, COLT.

[30]  P. Young,et al.  Time series analysis, forecasting and control , 1972, IEEE Transactions on Automatic Control.

[31]  Mehryar Mohri,et al.  Learning Theory and Algorithms for Forecasting Non-stationary Time Series , 2015, NIPS.

[32]  Ameet Talwalkar,et al.  Foundations of Machine Learning , 2012, Adaptive computation and machine learning.

[33]  Christoph H. Lampert,et al.  Learning Theory for Conditional Risk Minimization , 2017, AISTATS.

[34]  Fei-Yue Wang,et al.  Traffic Flow Prediction With Big Data: A Deep Learning Approach , 2015, IEEE Transactions on Intelligent Transportation Systems.

[35]  Helmut Lütkepohl,et al.  Chapter 6 Forecasting with VARMA Models , 2006 .

[36]  Cyrus Shahabi,et al.  Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting , 2017, ICLR.

[37]  Mahsa Ghafarianzadeh,et al.  Climate Prediction via Matrix Completion , 2013, AAAI.

[38]  Nikolay Laptev,et al.  Deep and Confident Prediction for Time Series at Uber , 2017, 2017 IEEE International Conference on Data Mining Workshops (ICDMW).

[39]  M. Mohri,et al.  Stability Bounds for Stationary φ-mixing and β-mixing Processes , 2010 .

[40]  T. Bollerslev,et al.  Generalized autoregressive conditional heteroskedasticity , 1986 .

[41]  Mehryar Mohri,et al.  Generalization Bounds for Time Series Prediction with Non-stationary Processes , 2014, ALT.

[42]  Inderjit S. Dhillon,et al.  Temporal Regularized Matrix Factorization for High-dimensional Time Series Prediction , 2016, NIPS.

[43]  Yisong Yue,et al.  Long-term Forecasting using Higher Order Tensor RNNs , 2017 .

[44]  Jonathan D. Cryer,et al.  Time Series Analysis , 1986 .