Dynamic Neural Turing Machine with Soft and Hard Addressing Schemes

We extend neural Turing machine (NTM) model into a dynamic neural Turing machine (D-NTM) by introducing a trainable memory addressing scheme. This addressing scheme maintains for each memory cell two separate vectors, content and address vectors. This allows the D-NTM to learn a wide variety of location-based addressing strategies including both linear and nonlinear ones. We implement the D-NTM with both continuous, differentiable and discrete, non-differentiable read/write mechanisms. We investigate the mechanisms and effects of learning to read and write into a memory through experiments on Facebook bAbI tasks using both a feedforward and GRUcontroller. The D-NTM is evaluated on a set of Facebook bAbI tasks and shown to outperform NTM and LSTM baselines. We have done extensive analysis of our model and different variations of NTM on bAbI task. We also provide further experimental results on sequential pMNIST, Stanford Natural Language Inference, associative recall and copy tasks.

[1]  Frederick R. Forst,et al.  On robust estimation of the location parameter , 1980 .

[2]  Sepp Hochreiter,et al.  Untersuchungen zu dynamischen neuronalen Netzen , 1991 .

[3]  R. J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[4]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[5]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[6]  C. Lee Giles,et al.  The Neural Network Pushdown Automaton: Architecture, Dynamics and Training , 1997, Summer School on Neural Networks.

[7]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[8]  Karol Gregor,et al.  Neural Variational Inference and Learning in Belief Networks , 2014, ICML.

[9]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[10]  Alex Graves,et al.  Neural Turing Machines , 2014, ArXiv.

[11]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[12]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[13]  Jason Weston,et al.  End-To-End Memory Networks , 2015, NIPS.

[14]  Jason Weston,et al.  A Neural Attention Model for Abstractive Sentence Summarization , 2015, EMNLP.

[15]  Christopher Joseph Pal,et al.  Describing Videos by Exploiting Temporal Structure , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[16]  Jason Weston,et al.  Large-scale Simple Question Answering with Memory Networks , 2015, ArXiv.

[17]  Jason Weston,et al.  Memory Networks , 2014, ICLR.

[18]  Quoc V. Le,et al.  A Neural Conversational Model , 2015, ArXiv.

[19]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[20]  Margaret Mitchell,et al.  VQA: Visual Question Answering , 2015, International Journal of Computer Vision.

[21]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[22]  Yoshua Bengio,et al.  Attention-Based Models for Speech Recognition , 2015, NIPS.

[23]  Phil Blunsom,et al.  Teaching Machines to Read and Comprehend , 2015, NIPS.

[24]  Tomas Mikolov,et al.  Inferring Algorithmic Patterns with Stack-Augmented Recurrent Nets , 2015, NIPS.

[25]  Geoffrey E. Hinton,et al.  A Simple Way to Initialize Recurrent Networks of Rectified Linear Units , 2015, ArXiv.

[26]  Phil Blunsom,et al.  Learning to Transduce with Unbounded Memory , 2015, NIPS.

[27]  Yang Yu,et al.  Structured Memory for Neural Turing Machines , 2015, ArXiv.

[28]  Christopher Potts,et al.  A large annotated corpus for learning natural language inference , 2015, EMNLP.

[29]  Wojciech Zaremba,et al.  Reinforcement Learning Neural Turing Machines , 2015, ArXiv.

[30]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[31]  Joelle Pineau,et al.  Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models , 2015, AAAI.

[32]  Phil Blunsom,et al.  Reasoning about Entailment with Neural Attention , 2015, ICLR.

[33]  Jason Weston,et al.  The Goldilocks Principle: Reading Children's Books with Explicit Memory Representations , 2015, ICLR.

[34]  Wojciech Zaremba,et al.  Learning Simple Algorithms from Examples , 2015, ICML.

[35]  Misha Denil,et al.  Noisy Activation Functions , 2016, ICML.

[36]  John Salvatier,et al.  Theano: A Python framework for fast computation of mathematical expressions , 2016, ArXiv.

[37]  Sergio Gomez Colmenarejo,et al.  Hybrid computing using a neural network with dynamic external memory , 2016, Nature.

[38]  Geoffrey E. Hinton,et al.  Layer Normalization , 2016, ArXiv.

[39]  Jason Weston,et al.  Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks , 2015, ICLR.

[40]  Nando de Freitas,et al.  Neural Programmer-Interpreters , 2015, ICLR.

[41]  Jason Weston,et al.  Key-Value Memory Networks for Directly Reading Documents , 2016, EMNLP.

[42]  Daan Wierstra,et al.  One-shot Learning with Memory-Augmented Neural Networks , 2016, ArXiv.

[43]  Xiang Zhang,et al.  Evaluating Prerequisite Qualities for Learning End-to-End Dialog Systems , 2015, ICLR.

[44]  Alex Graves,et al.  Scaling Memory-Augmented Neural Networks with Sparse Reads and Writes , 2016, NIPS.

[45]  Pascal Vincent,et al.  Hierarchical Memory Networks , 2016, ArXiv.

[46]  Greg Yang,et al.  Lie Access Neural Turing Machine , 2016, ArXiv.

[47]  Erhardt Barth,et al.  Recurrent Dropout without Memory Loss , 2016, COLING.

[48]  Yoshua Bengio,et al.  Unitary Evolution Recurrent Neural Networks , 2015, ICML.

[49]  Richard Socher,et al.  Dynamic Memory Networks for Visual and Textual Question Answering , 2016, ICML.

[50]  Aaron C. Courville,et al.  Recurrent Batch Normalization , 2016, ICLR.

[51]  Yoshua Bengio,et al.  Zoneout: Regularizing RNNs by Randomly Preserving Hidden Activations , 2016, ICLR.