A generalized LSTM-like training algorithm for second-order recurrent neural networks

The long short term memory (LSTM) is a second-order recurrent neural network architecture that excels at storing sequential short-term memories and retrieving them many time-steps later. LSTM's original training algorithm provides the important properties of spatial and temporal locality, which are missing from other training approaches, at the cost of limiting its applicability to a small set of network architectures. Here we introduce the generalized long short-term memory(LSTM-g) training algorithm, which provides LSTM-like locality while being applicable without modification to a much wider range of second-order network architectures. With LSTM-g, all units have an identical set of operating instructions for both activation and learning, subject only to the configuration of their local environment in the network; this is in contrast to the original LSTM training algorithm, where each type of unit has its own activation and training instructions. When applied to LSTM architectures with peephole connections, LSTM-g takes advantage of an additional source of back-propagated error which can enable better performance than the original algorithm. Enabled by the broad architectural applicability of LSTM-g, we demonstrate that training recurrent networks engineered for specific tasks can produce better results than single-layer networks. We conclude that LSTM-g has the potential to both improve the performance and broaden the applicability of spatially and temporally local gradient-based training algorithms for recurrent neural networks.

[1]  James A. Reggia,et al.  Simulating single word processing in the classic aphasia syndromes based on the Wernicke–Lichtheim–Geschwind theory , 2006, Brain and Language.

[2]  Jürgen Schmidhuber,et al.  Training Recurrent Networks by Evolino , 2007, Neural Computation.

[3]  Paul J. Werbos,et al.  Backpropagation Through Time: What It Does and How to Do It , 1990, Proc. IEEE.

[4]  Joydeep Ghosh,et al.  The pi-sigma network: an efficient higher-order neural network for pattern classification and function approximation , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[5]  Jürgen Schmidhuber,et al.  Kalman filters improve LSTM network performance in problems unsolvable by traditional recurrent nets , 2003, Neural Networks.

[6]  Colin Giles,et al.  Learning, invariance, and generalization in high-order neural networks. , 1987, Applied optics.

[7]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[8]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[9]  Lee A. Feldkamp,et al.  Neurocontrol of nonlinear dynamical systems with Kalman filter trained recurrent networks , 1994, IEEE Trans. Neural Networks.

[10]  Jürgen Schmidhuber,et al.  Recurrent nets that time and count , 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium.

[11]  Jürgen Schmidhuber,et al.  Learning to Forget: Continual Prediction with LSTM , 2000, Neural Computation.

[12]  Demetri Psaltis,et al.  Higher order associative memories and their optical implementations , 1988, Neural Networks.

[13]  C. Lee Giles,et al.  Experimental Comparison of the Effect of Order in Recurrent Neural Networks , 1993, Int. J. Pattern Recognit. Artif. Intell..

[14]  James A. Reggia,et al.  Systematically Grounding Language through Vision in a Deep, Recurrent Neural Network , 2011, AGI.

[15]  Andrew G. Barto,et al.  Temporal difference learning , 2007, Scholarpedia.

[16]  G. Miller Learning to Forget , 2004, Science.

[17]  Ronald J. Williams,et al.  A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.

[18]  Jürgen Schmidhuber,et al.  Offline Handwriting Recognition with Multidimensional Recurrent Neural Networks , 2008, NIPS.

[19]  Julian Togelius,et al.  Evolving Memory Cell Structures for Sequence Learning , 2009, ICANN.

[20]  Jürgen Schmidhuber,et al.  Biologically Plausible Speech Recognition with LSTM Neural Nets , 2004, BioADIT.

[21]  Geoffrey E. Hinton Connectionist Learning Procedures , 1989, Artif. Intell..

[22]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[23]  Jürgen Schmidhuber,et al.  LSTM recurrent networks learn simple context-free and context-sensitive languages , 2001, IEEE Trans. Neural Networks.

[24]  James A. Reggia,et al.  An unsupervised learning method for representing simple sentences , 2009, 2009 International Joint Conference on Neural Networks.

[25]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.