The Kanerva Machine: A Generative Distributed Memory

We present an end-to-end trained memory system that quickly adapts to new data and generates samples like them. Inspired by Kanerva's sparse distributed memory, it has a robust distributed reading and writing mechanism. The memory is analytically tractable, which enables optimal on-line compression via a Bayesian update-rule. We formulate it as a hierarchical conditional generative model, where memory provides a rich data-dependent prior distribution. Consequently, the top-down memory and bottom-up perception are combined to produce the code representing an observation. Empirically, we demonstrate that the adaptive memory significantly improves generative models trained on both the Omniglot and CIFAR datasets. Compared with the Differentiable Neural Computer (DNC) and its variants, our memory model has greater capacity and is significantly easier to train.

[1]  Pentti Kanerva,et al.  Sparse Distributed Memory , 1988 .

[2]  Demis Hassabis,et al.  Neural Episodic Control , 2017, ICML.

[3]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[4]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[5]  Thomas L. Griffiths,et al.  Approximating Bayesian inference with a sparse distributed memory system , 2013, CogSci.

[6]  Daan Wierstra,et al.  One-Shot Generalization in Deep Generative Models , 2016, ICML.

[7]  Ruslan Salakhutdinov,et al.  Importance Weighted Autoencoders , 2015, ICLR.

[8]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[10]  D. Hassabis,et al.  Patients with hippocampal amnesia cannot imagine new experiences , 2007, Proceedings of the National Academy of Sciences.

[11]  Joshua B. Tenenbaum,et al.  Human-level concept learning through probabilistic program induction , 2015, Science.

[12]  David Amos,et al.  Generative Temporal Models with Memory , 2017, ArXiv.

[13]  D. Aldous Exchangeability and related topics , 1985 .

[14]  Brendan J. Frey,et al.  A Revolution: Belief Propagation in Graphs with Cycles , 1997, NIPS.

[15]  Sergio Gomez Colmenarejo,et al.  Hybrid computing using a neural network with dynamic external memory , 2016, Nature.

[16]  Dmitry P. Vetrov,et al.  Fast Adaptation in Generative Models with Generative Matching Networks , 2016, ICLR.

[17]  J J Hopfield,et al.  Neural networks and physical systems with emergent collective computational abilities. , 1982, Proceedings of the National Academy of Sciences of the United States of America.

[18]  Amos J. Storkey,et al.  Towards a Neural Statistician , 2016, ICLR.

[19]  Daan Wierstra,et al.  Meta-Learning with Memory-Augmented Neural Networks , 2016, ICML.

[20]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[21]  Geoffrey E. Hinton,et al.  A Learning Algorithm for Boltzmann Machines , 1985, Cogn. Sci..

[22]  Oriol Vinyals,et al.  Matching Networks for One Shot Learning , 2016, NIPS.

[23]  Jörg Bornschein,et al.  Variational Memory Addressing in Generative Models , 2017, NIPS.

[24]  Bo Zhang,et al.  Learning to Generate with Memory , 2016, ICML.

[25]  C. H. Anderson A conditional probability interpretation of Kanerva's sparse distributed memory , 1989, International 1989 Joint Conference on Neural Networks.