Recurrent Neural Networks With External Addressable Long-Term and Working Memory for Learning Long-Term Dependences

Learning long-term dependences (LTDs) with recurrent neural networks (RNNs) is challenging due to their limited internal memories. In this paper, we propose a new external memory architecture for RNNs called an external addressable long-term and working memory (EALWM)-augmented RNN. This architecture has two distinct advantages over existing neural external memory architectures, namely the division of the external memory into two parts—long-term memory and working memory—with both addressable and the capability to learn LTDs without suffering from vanishing gradients with necessary assumptions. The experimental results on algorithm learning, language modeling, and question answering demonstrate that the proposed neural memory architecture is promising for practical applications.

[1]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[2]  Jinkun Liu,et al.  An adaptive RBF neural network control method for a class of nonlinear systems , 2018, IEEE/CAA Journal of Automatica Sinica.

[3]  Jürgen Schmidhuber,et al.  Learning Precise Timing with LSTM Recurrent Networks , 2003, J. Mach. Learn. Res..

[4]  James Martens,et al.  Deep learning via Hessian-free optimization , 2010, ICML.

[5]  Yoshua Bengio,et al.  On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[6]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[7]  Zhang Yi,et al.  Recurrent Neural Networks With Auxiliary Memory Units , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[8]  Wojciech Zaremba,et al.  Reinforcement Learning Neural Turing Machines , 2015, ArXiv.

[9]  Yoshua Bengio,et al.  Gated Feedback Recurrent Neural Networks , 2015, ICML.

[10]  Alessandro Sperduti,et al.  On the Computational Power of Recurrent Neural Networks for Structures , 1997, Neural Networks.

[11]  Xuelong Li,et al.  From Deterministic to Generative: Multimodal Stochastic RNNs for Video Captioning , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[12]  Dina L. Denham,et al.  Hinton diagrams: Viewing connection strengths in neural networks , 1994 .

[13]  N. Cowan What are the differences between long-term, short-term, and working memory? , 2008, Progress in brain research.

[14]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[15]  C. Honey,et al.  Hierarchical process memory: memory as an integral component of information processing , 2015, Trends in Cognitive Sciences.

[16]  A. Baddeley Working memory: looking back and looking forward , 2003, Nature Reviews Neuroscience.

[17]  Yoshua Bengio,et al.  Dynamic Neural Turing Machine with Soft and Hard Addressing Schemes , 2016, ArXiv.

[18]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[19]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[20]  Jürgen Schmidhuber,et al.  LSTM: A Search Space Odyssey , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[21]  Daan Wierstra,et al.  One-shot Learning with Memory-Augmented Neural Networks , 2016, ArXiv.

[22]  Alex Graves,et al.  Neural Turing Machines , 2014, ArXiv.

[23]  Sergio Gomez Colmenarejo,et al.  Hybrid computing using a neural network with dynamic external memory , 2016, Nature.

[24]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[25]  Razvan Pascanu,et al.  On the difficulty of training recurrent neural networks , 2012, ICML.

[26]  Lukás Burget,et al.  Extensions of recurrent neural network language model , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[27]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[28]  Jason Weston,et al.  Memory Networks , 2014, ICLR.

[29]  John Salvatier,et al.  Theano: A Python framework for fast computation of mathematical expressions , 2016, ArXiv.

[30]  Sotirios Chatzis,et al.  $t$ -Exponential Memory Networks for Question-Answering Machines , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[31]  Paul J. Werbos,et al.  Backpropagation Through Time: What It Does and How to Do It , 1990, Proc. IEEE.

[32]  Marcia K. Johnson,et al.  Prefrontal activity associated with working memory and episodic long-term memory , 2003, Neuropsychologia.

[33]  Chunhua Shen,et al.  Visual Question Answering with Memory-Augmented Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[34]  Marcin Andrychowicz,et al.  Learning Efficient Algorithms with Hierarchical Attentive Memory , 2016, ArXiv.

[35]  Wojciech Zaremba,et al.  Recurrent Neural Network Regularization , 2014, ArXiv.

[36]  A. Kolmogorov Three approaches to the quantitative definition of information , 1968 .

[37]  Jason Weston,et al.  End-To-End Memory Networks , 2015, NIPS.

[38]  Zhiguo Liu,et al.  Distributed containment control of networked nonlinear second-order systems with unknown parameters , 2018, IEEE/CAA Journal of Automatica Sinica.

[39]  Ilya Sutskever,et al.  Learning Recurrent Neural Networks with Hessian-Free Optimization , 2011, ICML.

[40]  Wang Ling,et al.  Memory Architectures in Recurrent Neural Network Language Models , 2018, ICLR.

[41]  Jason Weston,et al.  Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks , 2015, ICLR.

[42]  Vijay Kumar,et al.  Memory Augmented Control Networks , 2017, ICLR.

[43]  Huaguang Zhang,et al.  Exponential Stability and Stabilization of Delayed Memristive Neural Networks Based on Quadratic Convex Combination Method , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[44]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[45]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[46]  Marc'Aurelio Ranzato,et al.  Learning Longer Memory in Recurrent Neural Networks , 2014, ICLR.

[47]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[48]  Fenglong Ma,et al.  Long-Term Memory Networks for Question Answering , 2017, SML@IJCAI.