An Approximate Bayesian Long Short- Term Memory Algorithm for Outlier Detection

Long Short-Term Memory networks trained with gradient descent and back-propagation have received great success in various applications. However, point estimation of the weights of the networks is prone to over-fitting problems and lacks important uncertainty information associated with the estimation. However, exact Bayesian neural network methods are intractable and non-applicable for real-world applications. In this study, we propose an approximate estimation of the weights uncertainty using Ensemble Kalman Filter, which is easily scalable to a large number of weights. Furthermore, we optimize the covariance of the noise distribution in the ensemble update step using maximum likelihood estimation. To assess the proposed algorithm, we apply it to outlier detection in five realworld events retrieved from the Twitter platform.

[1]  Sharad Singhal,et al.  Training Multilayer Perceptrons with the Extende Kalman Algorithm , 1988, NIPS.

[2]  Geoffrey E. Hinton,et al.  Keeping the neural networks simple by minimizing the description length of the weights , 1993, COLT '93.

[3]  Fei-Fei Li,et al.  Deep visual-semantic alignments for generating image descriptions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[5]  Jürgen Schmidhuber,et al.  Learning to forget: continual prediction with LSTM , 1999 .

[6]  Nicolò Cesa-Bianchi,et al.  Advances in Neural Information Processing Systems 31 , 2018, NIPS 2018.

[7]  Hermann Hellwagner,et al.  Social media for crisis management: clustering approaches for sub-event detection , 2015, Multimedia Tools and Applications.

[8]  Julien Cornebise,et al.  Weight Uncertainty in Neural Networks , 2015, ArXiv.

[9]  Zoubin Ghahramani,et al.  A Theoretically Grounded Application of Dropout in Recurrent Neural Networks , 2015, NIPS.

[10]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[11]  Georgiana Dinu,et al.  Don’t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors , 2014, ACL.

[12]  Hermann Hellwagner,et al.  Automatic sub-event detection in emergency management using social media , 2012, WWW.

[13]  Rik Warren,et al.  Use of Mahalanobis Distance for Detecting Outliers and Outlier Clusters in Markedly Non-Normal Data: A Vehicular Traffic Example , 2011 .

[14]  Ron Meir,et al.  Expectation Backpropagation: Parameter-Free Training of Multilayer Neural Networks with Continuous or Discrete Weights , 2014, NIPS.

[15]  Rudolph van der Merwe,et al.  The unscented Kalman filter for nonlinear estimation , 2000, Proceedings of the IEEE 2000 Adaptive Systems for Signal Processing, Communications, and Control Symposium (Cat. No.00EX373).

[16]  Ryan P. Adams,et al.  Probabilistic Backpropagation for Scalable Learning of Bayesian Neural Networks , 2015, ICML.

[17]  Geir Evensen,et al.  The Ensemble Kalman Filter: theoretical formulation and practical implementation , 2003 .

[18]  Jeffrey K. Uhlmann,et al.  Unscented filtering and nonlinear estimation , 2004, Proceedings of the IEEE.

[19]  Alex Graves,et al.  Practical Variational Inference for Neural Networks , 2011, NIPS.

[20]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[21]  Julien Cornebise,et al.  Weight Uncertainty in Neural Network , 2015, ICML.

[22]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[23]  Léon Personnaz,et al.  A recursive algorithm based on the extended Kalman filter for the training of feedforward neural models , 1998, Neurocomputing.

[24]  David J. C. MacKay,et al.  A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.

[25]  Yannis Stavrakas,et al.  Degeneracy-Based Real-Time Sub-Event Detection in Twitter Stream , 2015, ICWSM.

[26]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[27]  Alex Graves,et al.  Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.

[28]  Andrew W. Senior,et al.  Long short-term memory recurrent neural network architectures for large scale acoustic modeling , 2014, INTERSPEECH.