Universal Hopfield Networks: A General Framework for Single-Shot Associative Memory Models

A large number of neural network models of associative memory have been proposed in the literature. These include the classical Hopfield networks (HNs), sparse distributed memories (SDMs), and more recently the modern continuous Hopfield networks (MCHNs), which possess close links with self-attention in machine learning. In this paper, we propose a general framework for understanding the operation of such memory networks as a sequence of three operations: similarity, separation, and projection. We derive all these memory models as instances of our general framework with differing similarity and separation functions. We extend the mathematical framework of Krotov & Hopfield (2020) to express general associative memory models using neural network dynamics with local computation, and derive a general energy function that is a Lyapunov function of the dynamics. Finally, using our framework, we empirically investigate the capacity of using different similarity functions for these associative memory models, beyond the dot product similarity measure, and demonstrate empirically that Euclidean or Manhattan distance similarity metrics perform substantially better in practice on many tasks, enabling a more robust retrieval and higher memory capacity than existing models.

[1]  Cengiz Pehlevan,et al.  Attention Approximates Sparse Distributed Memory , 2021, NeurIPS.

[2]  E. Brattico,et al.  Rapid encoding of musical tones discovered in whole-brain connectivity , 2021, NeuroImage.

[3]  Thomas Lukasiewicz,et al.  Associative Memories via Predictive Coding , 2021, NeurIPS.

[4]  Dmitry Krotov,et al.  Hierarchical Associative Memory , 2021, ArXiv.

[5]  Jonathan Berant,et al.  Memory-efficient Transformers via Top-k Attention , 2021, SUSTAINLP.

[6]  Fei Tang,et al.  A remark on a paper of Krotov and Hopfield [arXiv: 2008.06996] , 2021, ArXiv.

[7]  A. Dosovitskiy,et al.  MLP-Mixer: An all-MLP Architecture for Vision , 2021, NeurIPS.

[8]  Omer Levy,et al.  Transformer Feed-Forward Layers Are Key-Value Memories , 2020, EMNLP.

[9]  Yi Tay,et al.  Efficient Transformers: A Survey , 2020, ACM Comput. Surv..

[10]  J. Hopfield,et al.  Large Associative Memory Problem in Neurobiology and Machine Learning , 2020, ICLR.

[11]  David P. Kreil,et al.  Hopfield Networks is All You Need , 2020, ICLR.

[12]  C. Pehlevan,et al.  Associative Memory in Iterated Overparameterized Sigmoid Autoencoders , 2020, ICML.

[13]  Han Fang,et al.  Linformer: Self-Attention with Linear Complexity , 2020, ArXiv.

[14]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[15]  Yee Whye Teh,et al.  Multiplicative Interactions and Where to Find Them , 2020, ICLR.

[16]  Lukasz Kaiser,et al.  Reformer: The Efficient Transformer , 2020, ICLR.

[17]  Mikhail Belkin,et al.  Overparameterized neural networks implement associative memory , 2019, Proceedings of the National Academy of Sciences.

[18]  Hasan Şakir Bilge,et al.  Deep Metric Learning: A Survey , 2019, Symmetry.

[19]  Guillaume Lample,et al.  Augmenting Self-attention with Persistent Memory , 2019, ArXiv.

[20]  Rajesh P. N. Rao,et al.  Predictive Coding , 2019, A Blueprint for the Hard Problem of Consciousness.

[21]  Mikhail Belkin,et al.  Memorization in Overparameterized Autoencoders , 2018 .

[22]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[23]  Matthias Löwe,et al.  On a Model of Associative Memory with Huge Storage Capacity , 2017, 1702.01929.

[24]  John J. Hopfield,et al.  Dense Associative Memory for Pattern Recognition , 2016, NIPS.

[25]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[26]  Edmund T. Rolls,et al.  The mechanisms for pattern completion and pattern separation in the hippocampus , 2013, Front. Syst. Neurosci..

[27]  Brian Kulis,et al.  Metric Learning: A Survey , 2013, Found. Trends Mach. Learn..

[28]  Wei Wu,et al.  Storage Capacity of the Hopfield Network Associative Memory , 2012, 2012 Fifth International Conference on Intelligent Computation Technology and Automation.

[29]  Alan L. Yuille,et al.  The Concave-Convex Procedure , 2003, Neural Computation.

[30]  Heinrich Niemann,et al.  Storage Capacity of Kernel Associative Memories , 2002, ICANN.

[31]  Jinwen Ma,et al.  The asymptotic memory capacity of the generalized Hopfield network , 1999, Neural Networks.

[32]  Pentti Kanerva,et al.  Sparse distributed memory and related models , 1993 .

[33]  Louis A. Jaeckel An alternative design for a sparse distributed memory , 1989 .

[34]  Pentti Kanerva,et al.  Sparse Distributed Memory , 1988 .

[35]  J. Keeler Comparison Between Kanerva's SDM and Hopfield-Type Neural Networks , 1988, Cogn. Sci..

[36]  M. Usher,et al.  Capacities of multiconnected memory models , 1988 .

[37]  Abbott,et al.  Storage capacity of generalized networks. , 1987, Physical review. A, General physics.

[38]  Baldi,et al.  Number of stable points for spin-glasses and neural networks of higher orders. , 1987, Physical review letters.

[39]  C. L. Giles,et al.  High order correlation model for associative memory , 1987 .

[40]  Yaser S. Abu-Mostafa,et al.  Information capacity of the Hopfield model , 1985, IEEE Trans. Inf. Theory.

[41]  J J Hopfield,et al.  Neurons with graded response have collective computational properties like those of two-state neurons. , 1984, Proceedings of the National Academy of Sciences of the United States of America.

[42]  J J Hopfield,et al.  Neural networks and physical systems with emergent collective computational abilities. , 1982, Proceedings of the National Academy of Sciences of the United States of America.

[43]  S. Kirkpatrick,et al.  Infinite-ranged models of spin-glasses , 1978 .

[44]  W. Little The existence of persistent states in the brain , 1974 .

[45]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[46]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[47]  Jordan L. Boyd-Graber,et al.  Language Models , 2009, Encyclopedia of Database Systems.

[48]  Liu Yang An Overview of Distance Metric Learning , 2007 .

[49]  Terrence J. Sejnowski,et al.  ASSOCIATIVE MEMORY AND HIPPOCAMPAL PLACE CELLS , 1995 .