Understanding LSTM - a tutorial into Long Short-Term Memory Recurrent Neural Networks

Long Short-Term Memory Recurrent Neural Networks (LSTM-RNN) are one of the most powerful dynamic classifiers publicly known. The network itself and the related learning algorithms are reasonably well documented to get an idea how it works. This paper will shed more light into understanding how LSTM-RNNs evolved and why they work impressively well, focusing on the early, ground-breaking publications. We significantly improved documentation and fixed a number of errors and inconsistencies that accumulated in previous publications. To support understanding we as well revised and unified the notation used.

[1]  Yoshua Bengio,et al.  Attention-Based Models for Speech Recognition , 2015, NIPS.

[2]  Jürgen Schmidhuber,et al.  A Clockwork RNN , 2014, ICML.

[3]  Jürgen Schmidhuber,et al.  LSTM recurrent networks learn simple context-free and context-sensitive languages , 2001, IEEE Trans. Neural Networks.

[4]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[5]  Quoc V. Le,et al.  Addressing the Rare Word Problem in Neural Machine Translation , 2014, ACL.

[6]  Yoshua Bengio,et al.  End-to-end Continuous Speech Recognition using Attention-based Recurrent NN: First Results , 2014, ArXiv.

[7]  Alex Graves,et al.  Rapid Retraining on Speech Data with LSTM Recurrent Networks. , 2005 .

[8]  Klaus Obermayer,et al.  Fast model-based protein homology detection without alignment , 2007, Bioinform..

[9]  Hervé Bourlard,et al.  Connectionist Speech Recognition: A Hybrid Approach , 1993 .

[10]  Jun Zhu,et al.  Revisit Long Short-Term Memory: An Optimization Perspective , 2015 .

[11]  Narendra S. Chaudhari,et al.  Capturing Long-Term Dependencies for Protein Secondary Structure Prediction , 2004, ISNN.

[12]  Jürgen Schmidhuber,et al.  Offline Handwriting Recognition with Multidimensional Recurrent Neural Networks , 2008, NIPS.

[13]  Julian Togelius,et al.  Evolving Memory Cell Structures for Sequence Learning , 2009, ICANN.

[14]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[15]  Ronald J. Williams,et al.  A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.

[16]  Björn W. Schuller,et al.  Keyword spotting exploiting Long Short-Term Memory , 2013, Speech Commun..

[17]  Sepp Hochreiter,et al.  Learning to Learn Using Gradient Descent , 2001, ICANN.

[18]  Wojciech Zaremba,et al.  Recurrent Neural Network Regularization , 2014, ArXiv.

[19]  Jürgen Schmidhuber,et al.  Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.

[20]  Alex Graves,et al.  Grid Long Short-Term Memory , 2015, ICLR.

[21]  Jürgen Schmidhuber,et al.  Sequence Labelling in Structured Domains with Hierarchical Recurrent Neural Networks , 2007, IJCAI.

[22]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[23]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[24]  Navdeep Jaitly,et al.  Towards End-To-End Speech Recognition with Recurrent Neural Networks , 2014, ICML.

[25]  Geoffrey E. Hinton,et al.  Generating Text with Recurrent Neural Networks , 2011, ICML.

[26]  Jürgen Schmidhuber,et al.  Classifying Unprompted Speech by Retraining LSTM Nets , 2005, ICANN.

[27]  Jürgen Schmidhuber,et al.  Kalman filters improve LSTM network performance in problems unsolvable by traditional recurrent nets , 2003, Neural Networks.

[28]  A. Graves,et al.  Unconstrained Online Handwriting Recognition with Recurrent Neural Networks , 2007 .

[29]  Ronald J. Williams,et al.  Gradient-based learning algorithms for recurrent networks and their computational complexity , 1995 .

[30]  Christopher Kermorvant,et al.  Dropout Improves Recurrent Neural Networks for Handwriting Recognition , 2013, 2014 14th International Conference on Frontiers in Handwriting Recognition.

[31]  Alex Graves,et al.  Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.

[32]  Andrew W. Senior,et al.  Long short-term memory recurrent neural network architectures for large scale acoustic modeling , 2014, INTERSPEECH.

[33]  Alex Graves,et al.  Neural Turing Machines , 2014, ArXiv.

[34]  Volkmar Frinken,et al.  Keyword Spotting in Online Handwritten Documents Containing Text and Non-text Using BLSTM Neural Networks , 2011, 2011 International Conference on Document Analysis and Recognition.

[35]  Jürgen Schmidhuber,et al.  Multidimensional Recurrent Neural Networks , 2007 .

[36]  Björn W. Schuller,et al.  Abandoning emotion classes - towards continuous emotion recognition with modelling of long-range dependencies , 2008, INTERSPEECH.

[37]  Volkmar Frinken,et al.  Mode Detection in Online Handwritten Documents Using BLSTM Neural Networks , 2012, 2012 International Conference on Frontiers in Handwriting Recognition.

[38]  Jürgen Schmidhuber,et al.  Finding temporal structure in music: blues improvisation with LSTM recurrent networks , 2002, Proceedings of the 12th IEEE Workshop on Neural Networks for Signal Processing.

[39]  Samy Bengio,et al.  Order Matters: Sequence to sequence for sets , 2015, ICLR.

[40]  Jürgen Schmidhuber,et al.  Unconstrained On-line Handwriting Recognition with Recurrent Neural Networks , 2007, NIPS.

[41]  Navdeep Jaitly,et al.  Hybrid speech recognition with Deep Bidirectional LSTM , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.

[42]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[43]  Michael C. Mozer,et al.  Induction of Multiscale Temporal Structure , 1991, NIPS.

[44]  Björn W. Schuller,et al.  From speech to letters - using a novel neural network architecture for grapheme based ASR , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.

[45]  Michael I. Jordan Attractor dynamics and parallelism in a connectionist sequential machine , 1990 .

[46]  Jürgen Schmidhuber,et al.  LSTM can Solve Hard Long Time Lag Problems , 1996, NIPS.

[47]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[48]  Jürgen Schmidhuber,et al.  Learning to Forget: Continual Prediction with LSTM , 2000, Neural Computation.

[49]  Sepp Hochreiter,et al.  Untersuchungen zu dynamischen neuronalen Netzen , 1991 .

[50]  Geoffrey E. Hinton,et al.  Grammar as a Foreign Language , 2014, NIPS.

[51]  Marvin Minsky,et al.  Perceptrons: An Introduction to Computational Geometry , 1969 .

[52]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[53]  Wojciech Zaremba,et al.  An Empirical Exploration of Recurrent Network Architectures , 2015, ICML.

[54]  Bram Bakker,et al.  Reinforcement Learning with Long Short-Term Memory , 2001, NIPS.

[55]  Phil Blunsom,et al.  Recurrent Continuous Translation Models , 2013, EMNLP.

[56]  Jürgen Schmidhuber,et al.  A comparison between spiking and differentiable recurrent neural networks on spoken digit recognition , 2004, Neural Networks and Computational Intelligence.

[57]  Christopher Kermorvant,et al.  Handwritten Information Extraction from Historical Census Documents , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[58]  Michael J. Frank,et al.  Making Working Memory Work: A Computational Model of Learning in the Prefrontal Cortex and Basal Ganglia , 2006, Neural Computation.

[59]  Jürgen Schmidhuber,et al.  Multi-dimensional Recurrent Neural Networks , 2007, ICANN.

[60]  Jürgen Schmidhuber,et al.  Learning Precise Timing with LSTM Recurrent Networks , 2003, J. Mach. Learn. Res..

[61]  Marvin Minsky,et al.  Perceptrons: An Introduction to Computational Geometry, Expanded Edition , 1987 .

[62]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[63]  Sebastian Otte,et al.  Local Feature Based Online Mode Detection with Recurrent Neural Networks , 2012, 2012 International Conference on Frontiers in Handwriting Recognition.

[64]  Quoc V. Le,et al.  Listen, Attend and Spell , 2015, ArXiv.

[65]  Ralf C. Staudemeyer,et al.  Applying long short-term memory recurrent neural networks to intrusion detection , 2015 .

[66]  Samy Bengio,et al.  Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[67]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[68]  Nitish Srivastava,et al.  Improving Neural Networks with Dropout , 2013 .

[69]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[70]  Marcus Liwicki,et al.  A novel approach to on-line handwriting recognition based on bidirectional long short-term memory networks , 2007 .

[71]  Yoshua Bengio,et al.  Gradient Flow in Recurrent Nets: the Difficulty of Learning Long-Term Dependencies , 2001 .

[72]  Björn W. Schuller,et al.  Robust speech recognition using long short-term memory recurrent neural networks for hybrid acoustic modelling , 2014, INTERSPEECH.

[73]  J. Schmidhuber,et al.  Framewise phoneme classification with bidirectional LSTM networks , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..

[74]  Wojciech Zaremba,et al.  Learning to Execute , 2014, ArXiv.

[75]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[76]  Quoc V. Le,et al.  A Neural Conversational Model , 2015, ArXiv.

[77]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[78]  Trevor Darrell,et al.  Long-term recurrent convolutional networks for visual recognition and description , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[79]  Jürgen Schmidhuber,et al.  Phoneme recognition in TIMIT with BLSTM-CTC , 2008, ArXiv.

[80]  P J Webros BACKPROPAGATION THROUGH TIME: WHAT IT DOES AND HOW TO DO IT , 1990 .

[81]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.