Evolving memory-augmented neural architecture for deep memory problems

In this paper, we present a new memory-augmented neural network called Gated Recurrent Unit with Memory Block (GRU-MB). Our architecture builds on the gated neural architecture of a Gated Recurrent Unit (GRU) and integrates an external memory block, similar to a Neural Turing Machine (NTM). GRU-MB interacts with the memory block using independent read and write gates that serve to decouple the memory from the central feedforward operation. This allows for regimented memory access and update, administering our network the ability to choose when to read from memory, update it, or simply ignore it. This capacity to act in detachment allows the network to shield the memory from noise and other distractions, while simultaneously using it to effectively retain and propagate information over an extended period of time. We evolve GRU-MB using neuroevolution and perform experiments on two different deep memory tasks. Results demonstrate that GRU-MB performs significantly faster and more accurately than traditional memory-based methods, and is robust to dramatic increases in the depth of these tasks.

[1]  Risto Miikkulainen,et al.  Evolving Neural Networks through Augmenting Topologies , 2002, Evolutionary Computation.

[2]  Jürgen Schmidhuber,et al.  LSTM: A Search Space Odyssey , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[3]  Yoshua Bengio,et al.  Gated Feedback Recurrent Neural Networks , 2015, ICML.

[4]  Stéphane Doncieux,et al.  With a little help from selection pressures: evolution of memory in robot controllers , 2012, ALIFE.

[5]  Alex Graves,et al.  Neural Turing Machines , 2014, ArXiv.

[6]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[7]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[8]  Navdeep Jaitly,et al.  Hybrid speech recognition with Deep Bidirectional LSTM , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.

[9]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[10]  Risto Miikkulainen,et al.  Evolving Deep LSTM-based Memory Networks using an Information Maximization Objective , 2016, GECCO.

[11]  Paul J. Werbos,et al.  Backpropagation Through Time: What It Does and How to Do It , 1990, Proc. IEEE.

[12]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[13]  Ichiro Tsuda,et al.  Dynamic link of memory--Chaotic memory map in nonequilibrium neural networks , 1992, Neural Networks.

[14]  Yoshua Bengio,et al.  Hierarchical Recurrent Neural Networks for Long-Term Dependencies , 1995, NIPS.

[15]  Wojciech Zaremba,et al.  An Empirical Exploration of Recurrent Network Architectures , 2015, ICML.

[16]  Bram Bakker,et al.  Reinforcement Learning with Long Short-Term Memory , 2001, NIPS.

[17]  Nitish Srivastava,et al.  Unsupervised Learning of Video Representations using LSTMs , 2015, ICML.

[18]  Philip D. Wasserman,et al.  Advanced methods in neural computing , 1993, VNR computer library.

[19]  James A. Anderson,et al.  A simple neural network generating an interactive memory , 1972 .

[20]  Hermann Ney,et al.  LSTM Neural Networks for Language Modeling , 2012, INTERSPEECH.

[21]  M. D’Esposito Working memory. , 2008, Handbook of clinical neurology.

[22]  Julian Togelius,et al.  Evolving Memory Cell Structures for Sequence Learning , 2009, ICANN.

[23]  Sergio Gomez Colmenarejo,et al.  Hybrid computing using a neural network with dynamic external memory , 2016, Nature.

[24]  Charles Ofria,et al.  Early Evolution of Memory Usage in Digital Organisms , 2010, ALIFE.

[25]  Herbert Jaeger Artificial intelligence: Deep neural reasoning , 2016, Nature.

[26]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[27]  Jürgen Schmidhuber,et al.  Framewise phoneme classification with bidirectional LSTM and other neural network architectures , 2005, Neural Networks.

[28]  Les E. Atlas,et al.  Recurrent neural networks and robust time series prediction , 1994, IEEE Trans. Neural Networks.

[29]  Sepp Hochreiter,et al.  The Vanishing Gradient Problem During Learning Recurrent Neural Nets and Problem Solutions , 1998, Int. J. Uncertain. Fuzziness Knowl. Based Syst..