Self-Attentive Associative Memory

Heretofore, neural networks with external memory are restricted to single memory with lossy representations of memory interactions. A rich representation of relationships between memory pieces urges a high-order and segregated relational memory. In this paper, we propose to separate the storage of individual experiences (item memory) and their occurring relationships (relational memory). The idea is implemented through a novel Self-attentive Associative Memory (SAM) operator. Found upon outer product, SAM forms a set of associative memories that represent the hypothetical high-order relationships between arbitrary pairs of memory elements, through which a relational memory is constructed from an item memory. The two memories are wired into a single sequential model capable of both memorization and relational reasoning. We achieve competitive results with our proposed two-memory model in a diversity of machine learning tasks, from challenging synthetic problems to practical testbeds such as geometry, graph, reinforcement learning, and question answering.

[1]  Wei Zhang,et al.  Learning to update Auto-associative Memory in Recurrent Neural Networks for Improving Sequence Memorization , 2017, ArXiv.

[2]  Joshua B. Tenenbaum,et al.  Separating Style and Content with Bilinear Models , 2000, Neural Computation.

[3]  Sergio Gomez Colmenarejo,et al.  Hybrid computing using a neural network with dynamic external memory , 2016, Nature.

[4]  Jason Weston,et al.  Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks , 2015, ICLR.

[5]  Rasmus Pagh,et al.  Fast and scalable polynomial kernels via explicit feature maps , 2013, KDD.

[6]  Razvan Pascanu,et al.  Relational recurrent neural networks , 2018, NeurIPS.

[7]  J. Hodges Memory, Amnesia and the Hippocampal System , 1995 .

[8]  Truyen Tran,et al.  Neural Stored-program Memory , 2019, ICLR.

[9]  Byoung-Tak Zhang,et al.  Bilinear Attention Networks , 2018, NeurIPS.

[10]  M. Buckley The Role of the Perirhinal Cortex and Hippocampus in Learning, Memory, and Perception , 2005, The Quarterly journal of experimental psychology. B, Comparative and physiological psychology.

[11]  H Eichenbaum,et al.  Memory for items and memory for relations in the procedural/declarative memory framework. , 1997, Memory.

[12]  N. Cohen,et al.  Relational Memory and the Hippocampus: Representations and Methods , 2009, Front. Neurosci..

[13]  Jürgen Schmidhuber,et al.  Gated Fast Weights for On-The-Fly Neural Program Generation , 2017 .

[14]  Ingrid R. Olson,et al.  Working Memory for Conjunctions Relies on the Medial Temporal Lobe , 2006, The Journal of Neuroscience.

[15]  Alex Graves,et al.  Neural Turing Machines , 2014, ArXiv.

[16]  Bram Bakker,et al.  Reinforcement Learning with Long Short-Term Memory , 2001, NIPS.

[17]  D. Marr A theory of cerebellar cortex , 1969, The Journal of physiology.

[18]  Svetha Venkatesh,et al.  Learning to Remember More with Less Memorization , 2019, ICLR.

[19]  Navdeep Jaitly,et al.  Pointer Networks , 2015, NIPS.

[20]  Svetha Venkatesh,et al.  Dual Memory Neural Computer for Asynchronous Two-view Sequential Learning , 2018, KDD.

[21]  Svetha Venkatesh,et al.  Variational Memory Encoder-Decoder , 2018, NeurIPS.

[22]  Tsendsuren Munkhdalai,et al.  Metalearned Neural Memory , 2019, NeurIPS.

[23]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[24]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[25]  Geoffrey E. Hinton Tensor Product Variable Binding and the Representation of Symbolic Structures in Connectionist Systems , 1991 .

[26]  Razvan Pascanu,et al.  A simple neural network module for relational reasoning , 2017, NIPS.

[27]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[28]  Zhou Yu,et al.  Multi-modal Factorized Bilinear Pooling with Co-attention Learning for Visual Question Answering , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[29]  Raúl Rojas,et al.  Neural Networks - A Systematic Introduction , 1996 .

[30]  Mark Rudelson,et al.  Sampling from large matrices: An approach through geometric functional analysis , 2005, JACM.

[31]  Teuvo Kohonen,et al.  Correlation Matrix Memories , 1972, IEEE Transactions on Computers.

[32]  Margaret L. Schlichting,et al.  The hippocampus and inferential reasoning: building memories to navigate future decisions , 2012, Front. Hum. Neurosci..

[33]  Geoffrey E. Hinton,et al.  Using Fast Weights to Attend to the Recent Past , 2016, NIPS.

[34]  Geoffrey E. Hinton,et al.  Layer Normalization , 2016, ArXiv.

[35]  J J Hopfield,et al.  Neural networks and physical systems with emergent collective computational abilities. , 1982, Proceedings of the National Academy of Sciences of the United States of America.

[36]  Jason Weston,et al.  Tracking the World State with Recurrent Entity Networks , 2016, ICLR.

[37]  Christoph von der Malsburg,et al.  The Correlation Theory of Brain Function , 1994 .

[38]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[39]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[40]  Jürgen Schmidhuber,et al.  Learning to Reason with Third-Order Tensor Products , 2018, NeurIPS.

[41]  James L. McClelland,et al.  Generalization Through the Recurrent Interaction of Episodic Memories , 2012, Psychological review.