Improved multitask learning through synaptic intelligence

Deep learning has led to remarkable advances when applied to problems where the data distribution does not change over the course of learning. In stark contrast, biological neural networks continually adapt to changing domains, and solve a diversity of tasks simultaneously. Furthermore, synapses in biological neurons are not simply real-valued scalars, but possess complex molecular machinery enabling non-trivial learning dynamics. In this study, we take a first step toward bringing this biological complexity into artificial neural networks. We introduce a model of intelligent synapses that accumulate task relevant information over time, and exploit this information to efficiently consolidate memories of old tasks to protect them from being overwritten as new tasks are learned. We apply our framework to learning sequences of related classification problems, and show that it dramatically reduces catastrophic forgetting while maintaining computational efficiency.

[1]  Stefano Fusi,et al.  Computational principles of synaptic memory consolidation , 2016, Nature Neuroscience.

[2]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[3]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[4]  Razvan Pascanu,et al.  Progressive Neural Networks , 2016, ArXiv.

[5]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[6]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[7]  Razvan Pascanu,et al.  Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[8]  OctoMiao Overcoming catastrophic forgetting in neural networks , 2016 .

[9]  Trevor Darrell,et al.  DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[10]  Yoshua Bengio,et al.  How transferable are features in deep neural networks? , 2014, NIPS.

[11]  Jürgen Schmidhuber,et al.  Compete to Compute , 2013, NIPS.

[12]  Yann LeCun,et al.  The mnist database of handwritten digits , 2005 .

[13]  R. Morris,et al.  Making memories last: the synaptic tagging and capture hypothesis , 2010, Nature Reviews Neuroscience.

[14]  Junmo Kim,et al.  Less-forgetting Learning in Deep Neural Networks , 2016, ArXiv.

[15]  Stefan Carlsson,et al.  CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[16]  Ilya Sutskever,et al.  Estimating the Hessian by Back-propagating Curvature , 2012, ICML.

[17]  Razvan Pascanu,et al.  Revisiting Natural Gradient for Deep Networks , 2013, ICLR.

[18]  L. Abbott,et al.  Cascade Models of Synaptically Stored Memories , 2005, Neuron.

[19]  Kunihiko Fukushima,et al.  Neocognitron: A Self-Organizing Neural Network Model for a Mechanism of Visual Pattern Recognition , 1982 .

[20]  Dipti Srinivasan,et al.  Neural Networks for Continuous Online Learning and Control , 2006, IEEE Transactions on Neural Networks.

[21]  M. Poo,et al.  Reversal and Stabilization of Synaptic Modifications in a Developing Visual System , 2003, Science.

[22]  Yoshua Bengio,et al.  An Empirical Investigation of Catastrophic Forgeting in Gradient-Based Neural Networks , 2013, ICLR.

[23]  Surya Ganguli,et al.  A memory frontier for complex synapses , 2013, NIPS.

[24]  Derek Hoiem,et al.  Learning without Forgetting , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  J. Montgomery,et al.  State-Dependent Heterogeneity in Synaptic Depression between Pyramidal Cell Pairs , 2002, Neuron.

[26]  James Martens Second-order Optimization for Neural Networks , 2016 .