MQTransformer: Multi-Horizon Forecasts with Context Dependent and Feedback-Aware Attention

Recent advances in neural forecasting have produced major improvements in accuracy for probabilistic demand prediction. In this work, we propose novel improvements to the current state of the art by incorporating changes inspired by recent advances in Transformer architectures for Natural Language Processing. We develop a novel decoder-encoder attention for context-alignment, improving forecasting accuracy by allowing the network to study its own history based on the context for which it is producing a forecast. We also present a novel positional encoding that allows the neural network to learn context-dependent seasonality functions as well as arbitrary holiday distances. Finally we show that the current state of the art MQ-Forecaster (Wen et al., 2017) models display excess variability by failing to leverage previous errors in the forecast to improve accuracy. We propose a novel decoder-self attention scheme for forecasting that produces significant improvements in the excess variation of the forecast.

[1]  Yisong Yue,et al.  Long-term Forecasting using Higher Order Tensor RNNs , 2017 .

[2]  Fabio Porto,et al.  STConvS2S: Spatiotemporal Convolutional Sequence to Sequence Network for Weather Forecasting , 2019, ArXiv.

[3]  Nassim Nicholas Taleb Election predictions as martingales: an arbitrage approach , 2017 .

[4]  Andrew M. Dai,et al.  Music Transformer: Generating Music with Long-Term Structure , 2018, ICLR.

[5]  Paolo Torroni,et al.  Attention, please! A Critical Review of Neural Attention Models in Natural Language Processing , 2019, ArXiv.

[6]  Frank Y. Chen,et al.  Quantifying the Bullwhip Effect in a Simple Supply Chain: The Impact of Forecasting, Lead Times, and Information.: The Impact of Forecasting, Lead Times, and Information. , 2000 .

[7]  Syama Sundar Rangapuram,et al.  Neural forecasting: Introduction and literature overview , 2020, ArXiv.

[8]  D. Heath,et al.  Modelling the evolution of demand forecasts with application to safety stock analysis in production distribution systems , 1994 .

[9]  Dhruv Madeka,et al.  Sample Path Generation for Probabilistic Demand Forecasting , 2018 .

[10]  Ashish Vaswani,et al.  Self-Attention with Relative Position Representations , 2018, NAACL.

[11]  Hung-yi Lee,et al.  Temporal pattern attention for multivariate time series forecasting , 2018, Machine Learning.

[12]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[13]  Heiga Zen,et al.  WaveNet: A Generative Model for Raw Audio , 2016, SSW.

[14]  Myungjoo Kang,et al.  Financial series prediction using Attention LSTM , 2019, ArXiv.

[15]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[16]  Nicolas Loeff,et al.  Temporal Fusion Transformers for Interpretable Multi-horizon Time Series Forecasting , 2019, International Journal of Forecasting.

[17]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[18]  Wenhu Chen,et al.  Enhancing the Locality and Breaking the Memory Bottleneck of Transformer on Time Series Forecasting , 2019, NeurIPS.

[19]  Sandeep K. Shukla,et al.  Sequence to sequence deep learning models for solar irradiation forecasting , 2019, 2019 IEEE Milan PowerTech.

[20]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[21]  Yiming Yang,et al.  Transformer-XL: Attentive Language Models beyond a Fixed-Length Context , 2019, ACL.

[22]  Ruofeng Wen,et al.  Deep Generative Quantile-Copula Models for Probabilistic Forecasting , 2019, ArXiv.

[23]  Vadim V. Strijov,et al.  Position-Based Content Attention for Time Series Forecasting with Sequence-to-Sequence RNNs , 2017, ICONIP.

[24]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[25]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[26]  Haim Mendelson,et al.  Information Transmission and the Bullwhip Effect: An Empirical Investigation , 2012, Manag. Sci..

[27]  David Williams,et al.  Probability with Martingales , 1991, Cambridge mathematical textbooks.

[28]  M. Rabin,et al.  Belief Movement , Uncertainty Reduction , & Rational Updating ∗ , 2017 .

[29]  K. Torkkola,et al.  A Multi-Horizon Quantile Recurrent Forecaster , 2017, 1711.11053.

[30]  Douglas Eck,et al.  Music Transformer , 2018, 1809.04281.

[31]  Cesare Alippi,et al.  Deep Learning for Time Series Forecasting: The Electric Load Case , 2019, CAAI Trans. Intell. Technol..

[32]  Valentin Flunkert,et al.  DeepAR: Probabilistic Forecasting with Autoregressive Recurrent Networks , 2017, International Journal of Forecasting.

[33]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[34]  Mirella Lapata,et al.  Long Short-Term Memory-Networks for Machine Reading , 2016, EMNLP.