MEMO: A Deep Network for Flexible Combination of Episodic Memories

Recent research developing neural network architectures with external memory have often used the benchmark bAbI question and answering dataset which provides a challenging number of tasks requiring reasoning. Here we employed a classic associative inference task from the memory-based reasoning neuroscience literature in order to more carefully probe the reasoning capacity of existing memoryaugmented architectures. This task is thought to capture the essence of reasoning – the appreciation of distant relationships among elements distributed across multiple facts or memories. Surprisingly, we found that current architectures struggle to reason over long distance associations. Similar results were obtained on a more complex task involving finding the shortest path between nodes in a path. We therefore developed MEMO, an architecture endowed with the capacity to reason over longer distances. This was accomplished with the addition of two novel components. First, it introduces a separation between memories/facts stored in external memory and the items that comprise these facts in external memory. Second, it makes use of an adaptive retrieval mechanism, allowing a variable number of ‘memory hops’ before the answer is produced. MEMO is capable of solving our novel reasoning tasks, as well as match state of the art results in bAbI.

[1]  Razvan Pascanu,et al.  A simple neural network module for relational reasoning , 2017, NIPS.

[2]  Jason Weston,et al.  Tracking the World State with Recurrent Entity Networks , 2016, ICLR.

[3]  D Marr,et al.  Simple memory: a theory for archicortex. , 1971, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[4]  Alex Graves,et al.  Adaptive Computation Time for Recurrent Neural Networks , 2016, ArXiv.

[5]  Nicholas B. Turk-Browne,et al.  Complementary learning systems within the hippocampus: A neural network modeling approach to reconciling episodic memory with statistical learning , 2016, bioRxiv.

[6]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[7]  Razvan Pascanu,et al.  Relational inductive biases, deep learning, and graph networks , 2018, ArXiv.

[8]  R. Clark,et al.  The medial temporal lobe. , 2004, Annual review of neuroscience.

[9]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[10]  Michael Carbin,et al.  The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks , 2018, ICLR.

[11]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  N. Cohen From Conditioning to Conscious Recollection Memory Systems of the Brain. Oxford Psychology Series, Volume 35. , 2001 .

[13]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[14]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[15]  Ah Chung Tsoi,et al.  The Graph Neural Network Model , 2009, IEEE Transactions on Neural Networks.

[16]  Richard Socher,et al.  Ask Me Anything: Dynamic Memory Networks for Natural Language Processing , 2015, ICML.

[17]  Richard S. Zemel,et al.  Gated Graph Sequence Neural Networks , 2015, ICLR.

[18]  Ali Farhadi,et al.  Query-Reduction Networks for Question Answering , 2016, ICLR.

[19]  Jason Weston,et al.  End-To-End Memory Networks , 2015, NIPS.

[20]  James L. McClelland,et al.  Generalization Through the Recurrent Interaction of Episodic Memories , 2012, Psychological review.

[21]  Max Welling,et al.  Learning Sparse Neural Networks through L0 Regularization , 2017, ICLR.

[22]  Razvan Pascanu,et al.  Relational Deep Reinforcement Learning , 2018, ArXiv.

[23]  Yoshua Bengio,et al.  Hierarchical Multiscale Recurrent Neural Networks , 2016, ICLR.

[24]  Martin J. Chadwick,et al.  Big-Loop Recurrence within the Hippocampal System Supports Integration of Information across Episodes , 2018, Neuron.

[25]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[26]  Jure Leskovec,et al.  Inductive Representation Learning on Large Graphs , 2017, NIPS.

[27]  C. Stark,et al.  Pattern separation in the hippocampus , 2011, Trends in Neurosciences.

[28]  Héctor Allende,et al.  Working Memory Networks: Augmenting Memory Networks with a Relational Reasoning Module , 2018, ACL.

[29]  Venkatesh Saligrama,et al.  Adaptive Neural Networks for Efficient Inference , 2017, ICML.

[30]  Sergio Gomez Colmenarejo,et al.  Hybrid computing using a neural network with dynamic external memory , 2016, Nature.

[31]  H. Eichenbaum,et al.  Conservation of hippocampal memory function in rats and humans , 1996, Nature.

[32]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[33]  F. Scarselli,et al.  A new model for learning in graph domains , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..

[34]  Dharshan Kumaran,et al.  Retrieval-Based Model Accounts for Striking Profile of Episodic Memory and Generalization , 2016, Scientific Reports.

[35]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[36]  Jason Weston,et al.  Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks , 2015, ICLR.

[37]  Jordi Torres,et al.  Skip RNN: Learning to Skip State Updates in Recurrent Neural Networks , 2017, ICLR.

[38]  Alex Graves,et al.  Scaling Memory-Augmented Neural Networks with Sparse Reads and Writes , 2016, NIPS.

[39]  R. J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[40]  Quoc V. Le,et al.  Learning to Skim Text , 2017, ACL.

[41]  Alex Graves,et al.  Neural Turing Machines , 2014, ArXiv.

[42]  Yelong Shen,et al.  ReasoNet: Learning to Stop Reading in Machine Comprehension , 2016, CoCo@NIPS.

[43]  Margaret L. Schlichting,et al.  The hippocampus and inferential reasoning: building memories to navigate future decisions , 2012, Front. Hum. Neurosci..

[44]  Yoshua Bengio,et al.  Gated Feedback Recurrent Neural Networks , 2015, ICML.

[45]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[46]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[47]  Samuel S. Schoenholz,et al.  Neural Message Passing for Quantum Chemistry , 2017, ICML.

[48]  Razvan Pascanu,et al.  Learning Deep Generative Models of Graphs , 2018, ICLR 2018.