Distributed associative memory network with memory refreshing loss

Despite recent progress in memory augmented neural network (MANN) research, associative memory networks with a single external memory still show limited performance on complex relational reasoning tasks. Especially the content-based addressable memory networks often fail to encode input data into rich enough representation for relational reasoning and this limits the relation modeling performance of MANN for long temporal sequence data. To address these problems, here we introduce a novel Distributed Associative Memory architecture (DAM) with Memory Refreshing Loss (MRL) which enhances the relation reasoning performance of MANN. Inspired by how the human brain works, our framework encodes data with distributed representation across multiple memory blocks and repeatedly refreshes the contents for enhanced memorization similar to the rehearsal process of the brain. For this procedure, we replace a single external memory with a set of multiple smaller associative memory blocks and update these sub-memory blocks simultaneously and independently for the distributed representation of input data. Moreover, we propose MRL which assists a task's target objective while learning relational information existing in data. MRL enables MANN to reinforce an association between input data and task objective by reproducing stochastically sampled input data from stored memory contents. With this procedure, MANN further enriches the stored representations with relational information. In experiments, we apply our approaches to Differential Neural Computer (DNC), which is one of the representative content-based addressing memory models and achieves the state-of-the-art performance on both memorization and relational reasoning tasks.

[1]  Geoffrey E. Hinton,et al.  A Learning Algorithm for Boltzmann Machines , 1985, Cogn. Sci..

[2]  D. Bruce,et al.  Fifty Years Since Lashley's In Search of the Engram: Refutations and Conjectures , 2001, Journal of the history of the neurosciences.

[3]  D. Rundus,et al.  Maintenance rehearsal and long-term recency , 1980, Memory & cognition.

[4]  H. C. LONGUET-HIGGINS,et al.  Non-Holographic Associative Memory , 1969, Nature.

[5]  Quoc V. Le,et al.  Learning Longer-term Dependencies in RNNs with Auxiliary Losses , 2018, ICML.

[6]  Pentti Kanerva,et al.  Sparse Distributed Memory , 1988 .

[7]  Barbara Plank,et al.  When is multitask learning effective? Semantic sequence prediction under varying data conditions , 2016, EACL.

[8]  Jim Austin,et al.  Distributed associative memory for use in scene analysis , 1987, Image Vis. Comput..

[9]  Thomas Wennekers,et al.  Models of distributed associative memory networks in the brain , 2003, Theory in Biosciences.

[10]  Alessandra S. Souza,et al.  Refreshing memory traces: thinking of an item improves retrieval from visual working memory , 2015, Annals of the New York Academy of Sciences.

[11]  Bernhard Schölkopf,et al.  Recurrent Independent Mechanisms , 2021, ICLR.

[12]  Hong Yu,et al.  Neural Semantic Encoders , 2016, EACL.

[13]  BART KOSKO,et al.  Bidirectional associative memories , 1988, IEEE Trans. Syst. Man Cybern..

[14]  Changjin Xu,et al.  Bifurcation Properties for Fractional Order Delayed BAM Neural Networks , 2021, Cognitive Computation.

[15]  Chee Peng Lim,et al.  An Extended Analysis on Robust Dissipativity of Uncertain Stochastic Generalized Neural Networks with Markovian Jumping Parameters , 2020, Symmetry.

[16]  P. Alam,et al.  R , 1823, The Herodotus Encyclopedia.

[17]  Miss A.O. Penney (b) , 1974, The New Yale Book of Quotations.

[18]  Zhi-Hua Zhou,et al.  The Influence of Class Imbalance on Cost-Sensitive Learning: An Empirical Study , 2006, Sixth International Conference on Data Mining (ICDM'06).

[19]  Yang Song,et al.  Class-Balanced Loss Based on Effective Number of Samples , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Karl Steinbuch,et al.  Die Lernmatrix , 2004, Kybernetik.

[21]  Shai Ben-David,et al.  Exploiting Task Relatedness for Mulitple Task Learning , 2003, COLT.

[22]  Alex Graves,et al.  Scaling Memory-Augmented Neural Networks with Sparse Reads and Writes , 2016, NIPS.

[23]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[24]  Yoshua Bengio,et al.  Dynamic Neural Turing Machine with Continuous and Discrete Addressing Schemes , 2018, Neural Computation.

[25]  Navdeep Jaitly,et al.  Pointer Networks , 2015, NIPS.

[26]  Chee Peng Lim,et al.  Robust Stability of Complex-Valued Stochastic Neural Networks with Time-Varying Delays and Parameter Uncertainties , 2020, Mathematics.

[27]  Minho Lee,et al.  Atrial Fibrillation Prediction With Residual Network Using Sensitivity and Orthogonality Constraints , 2019, IEEE Journal of Biomedical and Health Informatics.

[28]  Demis Hassabis,et al.  MEMO: A Deep Network for Flexible Combination of Episodic Memories , 2020, ICLR.

[29]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[30]  Geoffrey E. Hinton,et al.  Learning Distributed Representations of Concepts Using Linear Relational Embedding , 2001, IEEE Trans. Knowl. Data Eng..

[31]  Joaquin M. Fuster,et al.  Distributed Memory for Both Short and Long Term , 1998, Neurobiology of Learning and Memory.

[32]  Rich Caruana,et al.  Promoting Poor Features to Supervisors: Some Inputs Work Better as Outputs , 1996, NIPS.

[33]  Geoffrey E. Hinton,et al.  Layer Normalization , 2016, ArXiv.

[34]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[35]  Sergio Gomez Colmenarejo,et al.  Hybrid computing using a neural network with dynamic external memory , 2016, Nature.

[36]  Jason Weston,et al.  Memory Networks , 2014, ICLR.

[37]  Lukasz Kaiser,et al.  Universal Transformers , 2018, ICLR.

[38]  Tsendsuren Munkhdalai,et al.  Metalearned Neural Memory , 2019, NeurIPS.

[39]  Jiajie Zhang,et al.  The Representation of Relational Information , 2019 .

[40]  J J Hopfield,et al.  Neural networks and physical systems with emergent collective computational abilities. , 1982, Proceedings of the National Academy of Sciences of the United States of America.

[41]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[42]  Truyen Tran,et al.  Neural Stored-program Memory , 2019, ICLR.

[43]  Naoaki Okazaki,et al.  Composing Distributed Representations of Relational Patterns , 2016, ACL.

[44]  Jörg Franke,et al.  Robust and Scalable Differentiable Neural Computer for Question Answering , 2018, QA@ACL.

[45]  Razvan Pascanu,et al.  Relational recurrent neural networks , 2018, NeurIPS.

[46]  Alex Graves,et al.  Associative Long Short-Term Memory , 2016, ICML.

[47]  Alex Graves,et al.  Neural Turing Machines , 2014, ArXiv.

[48]  Valérie Camos,et al.  Maintenance of item and order information in verbal working memory , 2017, Memory.

[49]  Peiluan Li,et al.  Fractional-order bidirectional associate memory (BAM) neural networks with multiple delays: The case of Hopf bifurcation , 2021, Math. Comput. Simul..

[50]  R. Sriraman,et al.  Robust Passivity and Stability Analysis of Uncertain Complex-Valued Impulsive Neural Networks with Time-Varying Delays , 2021, Neural Process. Lett..

[51]  Effects of maintenance rehearsal on human memory. , 1987 .

[52]  Svetha Venkatesh,et al.  Dual Memory Neural Computer for Asynchronous Two-view Sequential Learning , 2018, KDD.

[53]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[54]  Jason Weston,et al.  Tracking the World State with Recurrent Entity Networks , 2016, ICLR.

[55]  Jim Austin,et al.  Distributed associative memories for high-speed symbolic reasoning , 1996, Fuzzy Sets Syst..

[56]  Teuvo Kohonen,et al.  Correlation Matrix Memories , 1972, IEEE Transactions on Computers.

[57]  Geoffrey E. Hinton,et al.  Distributed representations and nested compositional structure , 1994 .

[58]  Richard F. Thompson Are memory traces localized or distributed? , 1991, Neuropsychologia.

[59]  Jason Weston,et al.  End-To-End Memory Networks , 2015, NIPS.

[60]  Changjin Xu,et al.  Bifurcation control of a fractional-order delayed competition and cooperation model of two enterprises , 2019, Science China Technological Sciences.

[61]  Ole Winther,et al.  Recurrent Relational Networks , 2017, NeurIPS.

[62]  R. F. Thompson,et al.  The search for the engram. , 1976, The American psychologist.

[63]  Francis Crick,et al.  The function of dream sleep , 1983, Nature.

[64]  Geoffrey E. Hinton,et al.  Distributed Representations , 1986, The Philosophy of Artificial Intelligence.

[65]  Truyen Tran,et al.  Self-Attentive Associative Memory , 2020, ICML.

[66]  Svetha Venkatesh,et al.  Learning to Remember More with Less Memorization , 2019, ICLR.

[67]  Marek Rei,et al.  Semi-supervised Multitask Learning for Sequence Labeling , 2017, ACL.

[68]  Teuvo Kohonen,et al.  Storage and Processing of Information in Distributed Associative Memory Systems , 1981 .

[69]  Jason Weston,et al.  Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks , 2015, ICLR.