Time-series forecasting with deep learning: a survey

Numerous deep learning architectures have been developed to accommodate the diversity of time-series datasets across different domains. In this article, we survey common encoder and decoder designs used in both one-step-ahead and multi-horizon time-series forecasting—describing how temporal information is incorporated into predictions by each model. Next, we highlight recent developments in hybrid deep learning models, which combine well-studied statistical models with neural network components to improve pure methods in either category. Lastly, we outline some ways in which deep learning can also facilitate decision support with time-series data. This article is part of the theme issue ‘Machine learning for weather and climate modelling’.

[1]  R. E. Kalman,et al.  A New Approach to Linear Filtering and Prediction Problems , 2002 .

[2]  Peter R. Winters,et al.  Forecasting Sales by Exponentially Weighted Moving Averages , 1960 .

[3]  E. Nadaraya On Estimating Regression , 1964 .

[4]  Gwilym M. Jenkins,et al.  Time series analysis, forecasting and control , 1972 .

[5]  P. Young,et al.  Time series analysis, forecasting and control , 1972, IEEE Transactions on Automatic Control.

[6]  George E. P. Box,et al.  Time Series Analysis: Forecasting and Control , 1977 .

[7]  Everette S. Gardner,et al.  Exponential smoothing: The state of the art , 1985 .

[8]  Alexander H. Waibel,et al.  Modular Construction of Time-Delay Neural Networks for Speech Recognition , 1989, Neural Computation.

[9]  Andrew Harvey,et al.  Forecasting, Structural Time Series Models and the Kalman Filter , 1990 .

[10]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[11]  Eric A. Wan,et al.  Time series prediction by using a connectionist network with internal delay lines , 1993 .

[12]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[13]  Carl E. Rasmussen,et al.  In Advances in Neural Information Processing Systems , 2011 .

[14]  Richard G. Lyons,et al.  Understanding Digital Signal Processing , 1996 .

[15]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[16]  Markus Voelter,et al.  State of the Art , 1997, Pediatric Research.

[17]  Robert Fildes,et al.  Generalising about univariate forecasting methods: Further empirical evidence , 1998 .

[18]  Jonathan Baxter,et al.  A Model of Inductive Bias Learning , 2000, J. Artif. Intell. Res..

[19]  Spyros Makridakis,et al.  The M3-Competition: results, conclusions and implications , 2000 .

[20]  S. C. Kremer,et al.  Gradient Flow in Recurrent Nets: the Difficulty of Learning Long-Term Dependencies , 2001 .

[21]  Bernhard Schölkopf,et al.  A tutorial on support vector regression , 2004, Stat. Comput..

[22]  Richard G. Lyons,et al.  Understanding Digital Signal Processing (2nd Edition) , 2004 .

[23]  J. Stock,et al.  A Comparison of Direct and Iterated Multistep Ar Methods for Forecasting Macroeconomic Time Series , 2005 .

[24]  Rob J Hyndman,et al.  Automatic Time Series Forecasting: The forecast Package for R , 2008 .

[25]  Antti Sorjamaa,et al.  Multiple-output modeling for multi-step-ahead time series forecasting , 2010, Neurocomputing.

[26]  Amir F. Atiya,et al.  An Empirical Comparison of Machine Learning Models for Time Series Forecasting , 2010 .

[27]  H. Ombao,et al.  Editorial: Special issue on time series analysis in the biological sciences , 2012 .

[28]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[29]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Simo Srkk,et al.  Bayesian Filtering and Smoothing , 2013 .

[31]  Neil D. Lawrence,et al.  Deep Gaussian Processes , 2012, AISTATS.

[32]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[33]  Alex Graves,et al.  Neural Turing Machines , 2014, ArXiv.

[34]  Andrew Zisserman,et al.  Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[35]  Eric Horvitz,et al.  A Deep Hybrid Model for Weather Forecasting , 2015, KDD.

[36]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[37]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[38]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[39]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[40]  Heiga Zen,et al.  WaveNet: A Generative Model for Raw Audio , 2016, SSW.

[41]  Jimeng Sun,et al.  RETAIN: An Interpretable Predictive Model for Healthcare using Reverse Time Attention Mechanism , 2016, NIPS.

[42]  Percy Liang,et al.  Understanding Black-box Predictions via Influence Functions , 2017, ICML.

[43]  Borhan Molazem Sanandaji,et al.  Deep Forecast: Deep Learning-based Spatio-Temporal Forecasting , 2017, ArXiv.

[44]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[45]  Sander Bohte,et al.  Conditional Time Series Forecasting with Convolutional Neural Networks , 2017, 1703.04691.

[46]  Scott Lundberg,et al.  A Unified Approach to Interpreting Model Predictions , 2017, NIPS.

[47]  Valentin Flunkert,et al.  DeepAR: Probabilistic Forecasting with Autoregressive Recurrent Networks , 2017, International Journal of Forecasting.

[48]  Joos-Hendrik Böse,et al.  Probabilistic Demand Forecasting at Scale , 2017, Proc. VLDB Endow..

[49]  G. Collins,et al.  Handling time varying confounding in observational research , 2017, British Medical Journal.

[50]  K. Torkkola,et al.  A Multi-Horizon Quantile Recurrent Forecaster , 2017, 1711.11053.

[51]  Alun D. Preece,et al.  Interpretability of deep learning models: A survey of results , 2017, 2017 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computed, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI).

[52]  Mihaela van der Schaar,et al.  Deep Counterfactual Networks with Propensity-Dropout , 2017, ArXiv.

[53]  Kevin Leyton-Brown,et al.  Deep IV: A Flexible Approach for Counterfactual Prediction , 2017, ICML.

[54]  Benjamin Letham,et al.  Forecasting at Scale , 2018 .

[55]  David Duvenaud,et al.  Neural Ordinary Differential Equations , 2018, NeurIPS.

[56]  Bryan Lim,et al.  Forecasting Treatment Responses Over Time Using Recurrent Marginal Structural Networks , 2018, NeurIPS.

[57]  Vladlen Koltun,et al.  An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling , 2018, ArXiv.

[58]  Cyrus Shahabi,et al.  Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting , 2017, ICLR.

[59]  Evangelos Spiliotis,et al.  Statistical and Machine Learning forecasting methods: Concerns and ways forward , 2018, PloS one.

[60]  Matthias W. Seeger,et al.  Deep State Space Models for Time Series Forecasting , 2018, NeurIPS.

[61]  Yee Whye Teh,et al.  Conditional Neural Processes , 2018, ICML.

[62]  Gautier Marti,et al.  Autoregressive Convolutional Neural Networks for Asynchronous Time Series , 2017, ICML.

[63]  Erik Cambria,et al.  Recent Trends in Deep Learning Based Natural Language Processing , 2017, IEEE Comput. Intell. Mag..

[64]  Shanshan Zhang,et al.  Interpretable Representation Learning for Healthcare via Capturing Disease Progression through Time , 2018, KDD.

[65]  Mihaela van der Schaar,et al.  GANITE: Estimation of Individualized Treatment Effects using Generative Adversarial Nets , 2018, ICLR.

[66]  Michael Bohlke-Schneider,et al.  High-Dimensional Multivariate Forecasting with Low-Rank Gaussian Copula Processes , 2019, NeurIPS.

[67]  Hsiang-Fu Yu,et al.  Think Globally, Act Locally: A Deep Neural Network Approach to High-Dimensional Time Series Forecasting , 2019, NeurIPS.

[68]  Wenhu Chen,et al.  Enhancing the Locality and Breaking the Memory Bottleneck of Transformer on Time Series Forecasting , 2019, NeurIPS.

[69]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[70]  Yi Pan,et al.  Multi-Horizon Time Series Forecasting with Temporal Attention Learning , 2019, KDD.

[71]  Ruofeng Wen,et al.  Deep Generative Quantile-Copula Models for Probabilistic Forecasting , 2019, ArXiv.

[72]  Tim Januschowski,et al.  Deep Factors for Forecasting , 2019, ICML.

[73]  Cynthia Rudin,et al.  Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead , 2018, Nature Machine Intelligence.

[74]  Manfred Mudelsee,et al.  Trend analysis of climate time series: A review of methods , 2019, Earth-Science Reviews.

[75]  Yiming Yang,et al.  Transformer-XL: Attentive Language Models beyond a Fixed-Length Context , 2019, ACL.

[76]  Stefan Zohren,et al.  Enhancing Time-Series Momentum Strategies Using Deep Neural Networks , 2019, The Journal of Financial Data Science.

[77]  Andreas Dengel,et al.  TSViz: Demystification of Deep Learning Models for Time-Series Analysis , 2018, IEEE Access.

[78]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[79]  Eric J Topol,et al.  High-performance medicine: the convergence of human and artificial intelligence , 2019, Nature Medicine.

[80]  Ruocheng Guo,et al.  Causal Interpretability for Machine Learning - Problems, Methods and Evaluation , 2020, SIGKDD Explor..

[81]  Mohamed F. Ghalwash,et al.  G-Net: A Deep Learning Approach to G-computation for Counterfactual Outcome Prediction Under Dynamic Treatment Regimes , 2020, ArXiv.

[82]  Michael Brundage,et al.  The M4 forecasting competition – A practitioner’s view , 2020 .

[83]  Evangelos Spiliotis,et al.  The M4 Competition: 100,000 time series and 61 forecasting methods , 2020 .

[84]  Slawek Smyl,et al.  A hybrid method of exponential smoothing and recurrent neural networks for time series forecasting , 2020, International Journal of Forecasting.

[85]  Stefan Zohren,et al.  Recurrent Neural Filters: Learning Independent Bayesian Filtering Steps for Time Series Prediction , 2019, 2020 International Joint Conference on Neural Networks (IJCNN).

[86]  Rob J. Hyndman,et al.  A brief history of forecasting competitions , 2020 .

[87]  Mihaela van der Schaar,et al.  Estimating Counterfactual Treatment Outcomes over Time Through Adversarially Balanced Representations , 2020, ICLR.

[88]  Nicolas Loeff,et al.  Temporal Fusion Transformers for Interpretable Multi-horizon Time Series Forecasting , 2019, International Journal of Forecasting.