Demand Forecasting in the Presence of Privileged Information

Predicting the amount of sales in the future is a fundamental problem in the replenishment process of retail companies. Models for forecasting the demand of an item typically rely on influential features and historical sales of the item. However, the values of some influential features (to which we refer as non-plannable features) are only known during model training (for the past), and not for the future at prediction time. Examples of such features include sales in other channels, such as other stores in chain supermarkets. Existing forecasting methods ignore such non-plannable features or wrongly assume that they are also known at prediction time. We identify non-plannable features as privileged information, i.e., information that is available at training time but not at prediction time, and design a neural network to leverage this source of data accordingly. We present a dual branch neural network architecture that incorporates non-plannable features at training time, with a first branch to embed the historical information, and a second branch, the privileged information (PI) branch, to predict demand based on privileged information. Next, we leverage a single branch network at prediction time, which applies a simulation component to mimic the behavior of the PI branch, whose inputs are not available at prediction time. We evaluate our approach on two real-world forecasting datasets, and find that it outperforms state-of-the-art competitors in terms of mean absolute error and symmetric mean absolute percentage error metrics. We further provide visualizations and conduct experiments to validate the contribution of different components in our proposed architecture.

[1]  PerlovskyLeonid 2009 Special Issue , 2009 .

[2]  Irena Koprinska,et al.  Convolutional Neural Networks for Energy Time Series Forecasting , 2018, 2018 International Joint Conference on Neural Networks (IJCNN).

[3]  Rustam M. Vahidov,et al.  Application of machine learning techniques for supply chain demand forecasting , 2008, Eur. J. Oper. Res..

[4]  Yulei Rao,et al.  A deep learning framework for financial time series using stacked autoencoders and long-short term memory , 2017, PloS one.

[5]  Vladimir Vapnik,et al.  A new learning paradigm: Learning using privileged information , 2009, Neural Networks.

[6]  Gang Xiao,et al.  SeriesNet:A Generative Time Series Forecasting Model , 2018, 2018 International Joint Conference on Neural Networks (IJCNN).

[7]  Chenliang Li,et al.  A Deep Neural Framework for Sales Forecasting in E-Commerce , 2019, CIKM.

[8]  Silvio Savarese,et al.  Deep Learning Under Privileged Information Using Heteroscedastic Dropout , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[9]  Hung-yi Lee,et al.  Temporal pattern attention for multivariate time series forecasting , 2018, Machine Learning.

[10]  Valentin Flunkert,et al.  DeepAR: Probabilistic Forecasting with Autoregressive Recurrent Networks , 2017, International Journal of Forecasting.

[11]  Shuicheng Yan,et al.  Training Group Orthogonal Neural Networks with Privileged Information , 2017, IJCAI.

[12]  Guokun Lai,et al.  Modeling Long- and Short-Term Temporal Patterns with Deep Neural Networks , 2017, SIGIR.

[13]  Navneet Vairagade,et al.  Demand Forecasting Using Random Forest and Artificial Neural Network for Supply Chain Management , 2019, ICCCI.

[14]  James Bailey,et al.  Time Series Forecasting Using Distribution Enhanced Linear Regression , 2013, PAKDD.

[15]  Heiga Zen,et al.  WaveNet: A Generative Model for Raw Audio , 2016, SSW.

[16]  Vladlen Koltun,et al.  Convolutional Sequence Modeling Revisited , 2018, ICLR.

[17]  Roger M. Stein Benchmarking default prediction models: pitfalls and remedies in model validation , 2007 .

[18]  Nitesh V. Chawla,et al.  RESTFul: Resolution-Aware Forecasting of Behavioral Time Series Data , 2018, CIKM.

[19]  Joos-Hendrik Böse,et al.  Probabilistic Demand Forecasting at Scale , 2017, Proc. VLDB Endow..

[20]  Trevor Darrell,et al.  Learning with Side Information through Modality Hallucination , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Naren Ramakrishnan,et al.  EpiDeep: Exploiting Embeddings for Epidemic Forecasting , 2019, KDD.

[22]  P. Young,et al.  Time series analysis, forecasting and control , 1972, IEEE Transactions on Automatic Control.

[23]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[24]  Yi Pan,et al.  Multi-Horizon Time Series Forecasting with Temporal Attention Learning , 2019, KDD.

[25]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[26]  K. Torkkola,et al.  A Multi-Horizon Quantile Recurrent Forecaster , 2017, 1711.11053.

[27]  Bernhard Schölkopf,et al.  Unifying distillation and privileged information , 2015, ICLR.

[28]  Garrison W. Cottrell,et al.  A Dual-Stage Attention-Based Recurrent Neural Network for Time Series Prediction , 2017, IJCAI.

[29]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[30]  Lin Wu,et al.  TADA: Trend Alignment with Dual-Attention Multi-task Recurrent Neural Networks for Sales Prediction , 2018, 2018 IEEE International Conference on Data Mining (ICDM).

[31]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[32]  Przemyslaw Grzegorzewski,et al.  Stock Trading with Random Forests, Trend Detection Tests and Force Index Volume Indicators , 2013, ICAISC.