Uncertainty Modelling in Deep Networks: Forecasting Short and Noisy Series

Deep Learning is a consolidated, state-of-the-art Machine Learning tool to fit a function when provided with large data sets of examples. However, in regression tasks, the straightforward application of Deep Learning models provides a point estimate of the target. In addition, the model does not take into account the uncertainty of a prediction. This represents a great limitation for tasks where communicating an erroneous prediction carries a risk. In this paper we tackle a real-world problem of forecasting impending financial expenses and incomings of customers, while displaying predictable monetary amounts on a mobile app. In this context, we investigate if we would obtain an advantage by applying Deep Learning models with a Heteroscedastic model of the variance of a network's output. Experimentally, we achieve a higher accuracy than non-trivial baselines. More importantly, we introduce a mechanism to discard low-confidence predictions, which means that they will not be visible to users. This should help enhance the user experience of our product.

[1]  Julien Cornebise,et al.  Weight Uncertainty in Neural Networks , 2015, ArXiv.

[2]  S. Wood Generalized Additive Models: An Introduction with R , 2006 .

[3]  Jürgen Schmidhuber,et al.  Learning to forget: continual prediction with LSTM , 1999 .

[4]  Lars Schmidt-Thieme,et al.  Bank Card Usage Prediction Exploiting Geolocation Information , 2016, ArXiv.

[5]  C. Bishop Mixture density networks , 1994 .

[6]  A. Kiureghian,et al.  Aleatory or epistemic? Does it matter? , 2009 .

[7]  R. Tibshirani,et al.  Generalized Additive Models , 1991 .

[8]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[9]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[10]  Julien Cornebise,et al.  Weight Uncertainty in Neural Network , 2015, ICML.

[11]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[12]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[13]  Kevin Gimpel,et al.  A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks , 2016, ICLR.

[14]  Teemu Mutanen,et al.  Customer churn prediction - a case study in retail banking , 2010, Data Mining for Business Applications.

[15]  S. Wood,et al.  Coverage Properties of Confidence Intervals for Generalized Additive Model Components , 2012 .

[16]  Gaurav Singh,et al.  Predicting Branch Visits and Credit Card Up-selling using Temporal Banking Data , 2016, 1607.06123.

[17]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[18]  Rob J Hyndman,et al.  Another look at measures of forecast accuracy , 2006 .

[19]  Alex Kendall,et al.  What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision? , 2017, NIPS.

[20]  Carl E. Rasmussen,et al.  A Practical Monte Carlo Implementation of Bayesian Learning , 1995, NIPS.

[21]  John Scott Bridle,et al.  Probabilistic Interpretation of Feedforward Classification Network Outputs, with Relationships to Statistical Pattern Recognition , 1989, NATO Neurocomputing.

[22]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[23]  Ryan P. Adams,et al.  Probabilistic Backpropagation for Scalable Learning of Bayesian Neural Networks , 2015, ICML.