On the memory properties of recurrent neural models

In this paper, we investigate the memory properties of two popular gated units: long short term memory (LSTM) and gated recurrent units (GRU), which have been used in recurrent neural networks (RNN) to achieve state-of-the-art performance on several machine learning tasks. We propose five basic tasks for isolating and examining specific capabilities relating to the implementation of memory. Results show that (i) both types of gated unit perform less reliably than standard RNN units on tasks testing fixed delay recall, (ii) the reliability of stochastic gradient descent decreases as network complexity increases, and (iii) gated units are found to perform better than standard RNNs on tasks that require values to be stored in memory and updated conditionally upon input to the network. Task performance is found to be surprisingly independent of network depth (number of layers) and connection architecture. Finally, visualisations of the solutions found by these networks are presented and explored, exposing for the first time how logic operations are implemented by individual gated cells and small groups of these cells.

[1]  Minoru Asada,et al.  Initialization and self‐organized optimization of recurrent neural network connectivity , 2009, HFSP journal.

[2]  Wojciech Zaremba,et al.  An Empirical Exploration of Recurrent Network Architectures , 2015, ICML.

[3]  Christian Igel,et al.  Empirical evaluation of the improved Rprop learning algorithms , 2003, Neurocomputing.

[4]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[5]  Fei-Fei Li,et al.  Visualizing and Understanding Recurrent Networks , 2015, ArXiv.

[6]  Jürgen Schmidhuber,et al.  A Clockwork RNN , 2014, ICML.

[7]  Jürgen Schmidhuber,et al.  LSTM recurrent networks learn simple context-free and context-sensitive languages , 2001, IEEE Trans. Neural Networks.

[8]  G. Marcus The Algebraic Mind: Integrating Connectionism and Cognitive Science , 2001 .

[9]  Jason Weston,et al.  Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks , 2015, ICLR.

[10]  Kam-Fai Wong,et al.  Towards Neural Network-based Reasoning , 2015, ArXiv.

[11]  Kai-Uwe Kühnberger,et al.  Anchoring Knowledge in Interaction: Towards a Harmonic Subsymbolic/Symbolic Framework and Architecture of Computational Cognition , 2015, AGI.

[12]  Richard Socher,et al.  Ask Me Anything: Dynamic Memory Networks for Natural Language Processing , 2015, ICML.

[13]  David M. Blei,et al.  Probabilistic topic models , 2012, Commun. ACM.

[14]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[15]  Jason Weston,et al.  End-To-End Memory Networks , 2015, NIPS.

[16]  Alex Graves,et al.  Neural Turing Machines , 2014, ArXiv.

[17]  Jason Weston,et al.  Towards Understanding Situated Natural Language , 2010, AISTATS.

[18]  Yoshua Bengio,et al.  Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[19]  Wojciech Zaremba,et al.  Learning to Execute , 2014, ArXiv.

[20]  Haim Sompolinsky,et al.  Short-term memory in orthogonal neural networks. , 2004, Physical review letters.

[21]  Alex Graves,et al.  Supervised Sequence Labelling with Recurrent Neural Networks , 2012, Studies in Computational Intelligence.

[22]  Martin A. Riedmiller,et al.  A direct adaptive method for faster backpropagation learning: the RPROP algorithm , 1993, IEEE International Conference on Neural Networks.

[23]  Marc'Aurelio Ranzato,et al.  Learning Longer Memory in Recurrent Neural Networks , 2014, ICLR.

[24]  Tomas Mikolov,et al.  Inferring Algorithmic Patterns with Stack-Augmented Recurrent Nets , 2015, NIPS.

[25]  Gordon Pipa,et al.  SORN: A Self-Organizing Recurrent Neural Network , 2009, Front. Comput. Neurosci..