Active Long Term Memory Networks

Continual Learning in artificial neural networks suffers from interference and forgetting when different tasks are learned sequentially. This paper introduces the Active Long Term Memory Networks (A-LTM), a model of sequential multi-task deep learning that is able to maintain previously learned association between sensory input and behavioral output while acquiring knew knowledge. A-LTM exploits the non-convex nature of deep neural networks and actively maintains knowledge of previously learned, inactive tasks using a distillation loss. Distortions of the learned input-output map are penalized but hidden layers are free to transverse towards new local optima that are more favorable for the multi-task objective. We re-frame the McClelland's seminal Hippocampal theory with respect to Catastrophic Inference (CI) behavior exhibited by modern deep architectures trained with back-propagation and inhomogeneous sampling of latent factors across epochs. We present empirical results of non-trivial CI during continual learning in Deep Linear Networks trained on the same task, in Convolutional Neural Networks when the task shifts from predicting semantic to graphical factors and during domain adaptation from simple to complex environments. We present results of the A-LTM model's ability to maintain viewpoint recognition learned in the highly controlled iLab-20M dataset with 10 object categories and 88 camera viewpoints, while adapting to the unstructured domain of Imagenet with 1,000 object categories.

[1]  J. Knott The organization of behavior: A neuropsychological theory , 1951 .

[2]  K. H. Stauder [Psychology of the child]. , 1953, Medizinische Klinik.

[3]  W. Scoville,et al.  LOSS OF RECENT MEMORY AFTER BILATERAL HIPPOCAMPAL LESIONS , 1957, Journal of neurology, neurosurgery, and psychiatry.

[4]  D. Hubel,et al.  EFFECTS OF VISUAL DEPRIVATION ON MORPHOLOGY AND PHYSIOLOGY OF CELLS IN THE CATS LATERAL GENICULATE BODY. , 1963, Journal of neurophysiology.

[5]  Geoffrey E. Hinton Learning distributed representations of concepts. , 1989 .

[6]  Lawrence D. Jackel,et al.  Handwritten Digit Recognition with a Back-Propagation Network , 1989, NIPS.

[7]  Michael McCloskey,et al.  Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem , 1989 .

[8]  Robert Hecht-Nielsen,et al.  Theory of the backpropagation neural network , 1989, International 1989 Joint Conference on Neural Networks.

[9]  Jack Mostow,et al.  Direct Transfer of Learned Information Among Neural Networks , 1991, AAAI.

[10]  G. Reeke The society of mind , 1991 .

[11]  Marvin Minsky,et al.  Society of Mind: A Response to Four Reviews , 1991, Artif. Intell..

[12]  Yann LeCun,et al.  Signature Verification Using A "Siamese" Time Delay Neural Network , 1993, Int. J. Pattern Recognit. Artif. Intell..

[13]  Rich Caruana,et al.  Learning Many Related Tasks at the Same Time with Backpropagation , 1994, NIPS.

[14]  James L. McClelland,et al.  Why there are complementary learning systems in the hippocampus and neocortex: insights from the successes and failures of connectionist models of learning and memory. , 1995, Psychological review.

[15]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[16]  Rich Caruana,et al.  Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[17]  R. French Catastrophic forgetting in connectionist networks , 1999, Trends in Cognitive Sciences.

[18]  J. Crutchfield,et al.  Thermodynamic depth of causal states: Objective complexity via minimal representations , 1999 .

[19]  C. Shalizi,et al.  Causal architecture, complexity and self-organization in time series and cellular automata , 2001 .

[20]  A. Ishai,et al.  Distributed and Overlapping Representations of Faces and Objects in Ventral Temporal Cortex , 2001, Science.

[21]  Yoshua Bengio,et al.  Gradient Flow in Recurrent Nets: the Difficulty of Learning Long-Term Dependencies , 2001 .

[22]  Long Ji Lin,et al.  Self-improving reactive agents based on reinforcement learning, planning and teaching , 1992, Machine Learning.

[23]  Rich Caruana,et al.  Model compression , 2006, KDD '06.

[24]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[25]  Klaus-Robert Müller,et al.  Efficient BackProp , 2012, Neural Networks: Tricks of the Trade.

[26]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Surya Ganguli,et al.  Learning hierarchical categories in deep neural networks , 2013, CogSci.

[28]  Ha Hong,et al.  Hierarchical Modular Optimization of Convolutional Networks Achieves Representations Similar to Macaque IT and Human Ventral Stream , 2013, NIPS.

[29]  Yoshua Bengio,et al.  An Empirical Investigation of Catastrophic Forgeting in Gradient-Based Neural Networks , 2013, ICLR.

[30]  Yoshua Bengio,et al.  How transferable are features in deep neural networks? , 2014, NIPS.

[31]  Surya Ganguli,et al.  Exact solutions to the nonlinear dynamics of learning in deep linear neural networks , 2013, ICLR.

[32]  Rich Caruana,et al.  Do Deep Nets Really Need to be Deep? , 2013, NIPS.

[33]  Andrea Vedaldi,et al.  MatConvNet: Convolutional Neural Networks for MATLAB , 2014, ACM Multimedia.

[34]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[35]  Joshua B. Tenenbaum,et al.  Deep Convolutional Inverse Graphics Network , 2015, NIPS.

[36]  Jianmin Wang,et al.  Learning Multiple Tasks with Deep Relationship Networks , 2015, ArXiv.

[37]  Matthew Richardson,et al.  Blending LSTMs into CNNs , 2015, ICLR 2016.

[38]  Matthew Richardson,et al.  Compressing LSTMs into CNNs , 2015, ArXiv.

[39]  Rauf Izmailov,et al.  Learning using privileged information: similarity control and knowledge transfer , 2015, J. Mach. Learn. Res..

[40]  Peter Kulchyski and , 2015 .

[41]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[42]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[43]  Razvan Pascanu,et al.  Policy Distillation , 2015, ICLR.

[44]  Ruslan Salakhutdinov,et al.  Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning , 2015, ICLR.

[45]  Tomas Mikolov,et al.  A Roadmap Towards Machine Intelligence , 2015, CICLing.

[46]  Matthew Richardson,et al.  Do Deep Convolutional Nets Really Need to be Deep (Or Even Convolutional)? , 2016, ArXiv.

[47]  Shie Mannor,et al.  Graying the black box: Understanding DQNs , 2016, ICML.

[48]  Joshua B. Tenenbaum,et al.  Building machines that learn and think like people , 2016, Behavioral and Brain Sciences.

[49]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[50]  Ali Borji,et al.  iLab-20M: A Large-Scale Controlled Object Dataset to Investigate Deep Learning , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  Tomaso A. Poggio,et al.  Bridging the Gaps Between Residual Learning, Recurrent Neural Networks and Visual Cortex , 2016, ArXiv.

[52]  Joshua B. Tenenbaum,et al.  Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation , 2016, NIPS.

[53]  Bernhard Schölkopf,et al.  Unifying distillation and privileged information , 2015, ICLR.

[54]  Shie Mannor,et al.  A Deep Hierarchical Approach to Lifelong Learning in Minecraft , 2016, AAAI.

[55]  Matthew Richardson,et al.  Do Deep Convolutional Nets Really Need to be Deep and Convolutional? , 2016, ICLR.