Parameter Efficient Deep Probabilistic Forecasting

Probabilistic time series forecasting is crucial in many application domains such as retail, ecommerce, finance, or biology. With the increasing availability of large volumes of data, a number of neural architectures have been proposed for this problem. In particular, Transformer-based methods achieve state-of-the-art performance on real-world benchmarks. However, these methods require a large number of parameters to be learned, which imposes high memory requirements on the computational resources for training such models. To address this problem, we introduce a novel Bidirectional Temporal Convolutional Network (BiTCN), which requires an order of magnitude less parameters than a common Transformerbased approach. Our model combines two Temporal Convolutional Networks (TCNs): the first network encodes future covariates of the time series, whereas the second network encodes past observations and covariates. We jointly estimate the parameters of an output distribution via these two networks. Experiments on four real-world datasets show that our method performs on par with four state-of-the-art probabilistic forecasting methods, including a Transformer-based approach and WaveNet, on two point metrics (sMAPE, NRMSE) as well as on a set of range metrics (quantile loss percentiles) in the majority of cases. Secondly, we demonstrate that our method requires significantly less parameters than Transformer-based methods, which means the model can be trained faster with significantly lower memory requirements, which as a consequence reduces the infrastructure cost for deploying these models.

[1]  Zizhuo Wang,et al.  Probabilistic Forecasting with Temporal Convolutional Neural Network , 2019, Neurocomputing.

[2]  Rob J Hyndman,et al.  Forecasting with Exponential Smoothing: The State Space Approach , 2008 .

[3]  C. Holt Author's retrospective on ‘Forecasting seasonals and trends by exponentially weighted moving averages’ , 2004 .

[4]  Guokun Lai,et al.  Modeling Long- and Short-Term Temporal Patterns with Deep Neural Networks , 2017, SIGIR.

[5]  Rob J. Hyndman,et al.  Another Look at Forecast Accuracy Metrics for Intermittent Demand , 2006 .

[6]  K. Nikolopoulos,et al.  The theta model: a decomposition approach to forecasting , 2000 .

[7]  Lin Wu,et al.  TADA: Trend Alignment with Dual-Attention Multi-task Recurrent Neural Networks for Sales Prediction , 2018, 2018 IEEE International Conference on Data Mining (ICDM).

[8]  G. Box,et al.  Distribution of Residual Autocorrelations in Autoregressive-Integrated Moving Average Time Series Models , 1970 .

[9]  Tim Salimans,et al.  Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks , 2016, NIPS.

[10]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[11]  George Athanasopoulos,et al.  Forecasting: principles and practice , 2013 .

[12]  Syama Sundar Rangapuram,et al.  Probabilistic Forecasting with Spline Quantile Function RNNs , 2019, AISTATS.

[13]  Benjamin Letham,et al.  Forecasting at Scale , 2018 .

[14]  Tie-Yan Liu,et al.  LightGBM: A Highly Efficient Gradient Boosting Decision Tree , 2017, NIPS.

[15]  Kevin Gimpel,et al.  Gaussian Error Linear Units (GELUs) , 2016 .

[16]  Thomas Fischer,et al.  Deep learning with long short-term memory networks for financial market predictions , 2017, Eur. J. Oper. Res..

[17]  Valentin Flunkert,et al.  DeepAR: Probabilistic Forecasting with Autoregressive Recurrent Networks , 2017, International Journal of Forecasting.

[18]  Syama Sundar Rangapuram,et al.  GluonTS: Probabilistic and Neural Time Series Modeling in Python , 2020, J. Mach. Learn. Res..

[19]  Cyrus Shahabi,et al.  Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting , 2017, ICLR.

[20]  Wenhu Chen,et al.  Enhancing the Locality and Breaking the Memory Bottleneck of Transformer on Time Series Forecasting , 2019, NeurIPS.

[21]  Vitaly Kuznetsov,et al.  Foundations of Sequence-to-Sequence Modeling for Time Series , 2018, AISTATS.

[22]  C. Bergmeir,et al.  Recurrent Neural Networks for Time Series Forecasting: Current Status and Future Directions , 2019, International Journal of Forecasting.

[23]  Andrew McCallum,et al.  Energy and Policy Considerations for Modern Deep Learning Research , 2020, AAAI.

[24]  Evangelos Spiliotis,et al.  The M4 Competition: 100,000 time series and 61 forecasting methods , 2020 .

[25]  Rob J Hyndman,et al.  Principles and Algorithms for Forecasting Groups of Time Series: Locality and Globality , 2020, International Journal of Forecasting.

[26]  M. Naderi,et al.  Think globally... , 2004, HIV prevention plus!.

[27]  Takuya Akiba,et al.  Optuna: A Next-generation Hyperparameter Optimization Framework , 2019, KDD.

[28]  Hsiang-Fu Yu,et al.  Think Globally, Act Locally: A Deep Neural Network Approach to High-Dimensional Time Series Forecasting , 2019, NeurIPS.

[29]  Vladlen Koltun,et al.  An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling , 2018, ArXiv.

[30]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[31]  K. Torkkola,et al.  A Multi-Horizon Quantile Recurrent Forecaster , 2017, 1711.11053.

[32]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[33]  Heiga Zen,et al.  WaveNet: A Generative Model for Raw Audio , 2016, SSW.

[34]  Joos-Hendrik Böse,et al.  Probabilistic Demand Forecasting at Scale , 2017, Proc. VLDB Endow..

[35]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[36]  J. Yosinski,et al.  Time-series Extreme Event Forecasting with Neural Networks at Uber , 2017 .

[37]  Gebräuchliche Fertigarzneimittel,et al.  V , 1893, Therapielexikon Neurologie.

[38]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[39]  Yangguang Zang,et al.  Sales forecasting using WaveNet within the framework of the Kaggle competition , 2018, ArXiv.