Representation Memorization for Fast Learning New Knowledge without Forgetting

The ability to quickly learn new knowledge (e.g. new classes or data distributions) is a big step towards human-level intelligence. In this paper, we consider scenarios that require learning new classes or data distributions quickly and incrementally over time, as it often occurs in real-world dynamic environments. We propose “Memory-based Hebbian Parameter Adaptation” (Hebb) to tackle the two major challenges (i.e., catastrophic forgetting and sample efficiency) towards this goal in a unified framework. To mitigate catastrophic forgetting, Hebb augments a regular neural classifier with a continuously updated memory module to store representations of previous data. To improve sample efficiency, we propose a parameter adaptation method based on the well-known Hebbian theory Hebb [1949], which directly “wires” the output network’s parameters with similar representations retrieved from the memory. We empirically verify the superior performance of Hebb through extensive experiments on a wide range of learning tasks (image classification, language model) and learning scenarios (continual, incremental, online). We demonstrate that Hebb effectively mitigates catastrophic forgetting, and it indeed learns new knowledge better and faster than the current state-of-theart.

[1]  Yang Song,et al.  Class-Balanced Loss Based on Effective Number of Samples , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Boi Faltings,et al.  Memory Augmented Neural Model for Incremental Session-based Recommendation , 2020, IJCAI.

[3]  Cordelia Schmid,et al.  End-to-End Incremental Learning , 2018, ECCV.

[4]  A. Emin Orhan,et al.  A Simple Cache Model for Image Recognition , 2018, NeurIPS.

[5]  Michael McCloskey,et al.  Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem , 1989 .

[6]  Cordelia Schmid,et al.  Product Quantization for Nearest Neighbor Search , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[8]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Kenneth O. Stanley,et al.  Backpropamine: training self-modifying neural networks with differentiable neuromodulated plasticity , 2018, ICLR.

[10]  Mark Sandler,et al.  MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[11]  Geoffrey E. Hinton,et al.  Using Fast Weights to Attend to the Recent Past , 2016, NIPS.

[12]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[13]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[14]  Tsendsuren Munkhdalai,et al.  Metalearning with Hebbian Fast Weights , 2018, ArXiv.

[15]  Richard Socher,et al.  Pointer Sentinel Mixture Models , 2016, ICLR.

[16]  J. Meigs,et al.  WHO Technical Report , 1954, The Yale Journal of Biology and Medicine.

[17]  Razvan Pascanu,et al.  Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[18]  Moustapha Cissé,et al.  Unbounded cache model for online language modeling with open vocabulary , 2017, NIPS.

[19]  Alex Graves,et al.  Neural Turing Machines , 2014, ArXiv.

[20]  Kenneth O. Stanley,et al.  Differentiable plasticity: training plastic neural networks with backpropagation , 2018, ICML.

[21]  Razvan Pascanu,et al.  Memory-based Parameter Adaptation , 2018, ICLR.

[22]  Sebastian Ruder,et al.  Episodic Memory in Lifelong Language Learning , 2019, NeurIPS.

[23]  James T. Kwok,et al.  Generalizing from a Few Examples , 2019, ACM Comput. Surv..

[24]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Boi Faltings,et al.  Memory Augmented Neural Model for Incremental Session-based Recommendation , 2020, IJCAI.

[26]  Richard Socher,et al.  Regularizing and Optimizing LSTM Language Models , 2017, ICLR.

[27]  Daan Wierstra,et al.  One-shot Learning with Memory-Augmented Neural Networks , 2016, ArXiv.

[28]  Christoph H. Lampert,et al.  iCaRL: Incremental Classifier and Representation Learning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Aurko Roy,et al.  Learning to Remember Rare Events , 2017, ICLR.

[30]  J. Knott The organization of behavior: A neuropsychological theory , 1951 .

[31]  Peter Dayan,et al.  Fast Parametric Learning with Activation Memorization , 2018, ICML.

[32]  Yang Liu,et al.  Learning to Remember Translation History with a Continuous Cache , 2017, TACL.

[33]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[34]  Nicolas Usunier,et al.  Improving Neural Language Models with a Continuous Cache , 2016, ICLR.