Toward Training Recurrent Neural Networks for Lifelong Learning

Catastrophic forgetting and capacity saturation are the central challenges of any parametric lifelong learning system. In this work, we study these challenges in the context of sequential supervised learning with an emphasis on recurrent neural networks. To evaluate the models in the lifelong learning setting, we propose a curriculum-based, simple, and intuitive benchmark where the models are trained on tasks with increasing levels of difficulty. To measure the impact of catastrophic forgetting, the model is tested on all the previous tasks as it completes any task. As a step toward developing true lifelong learning systems, we unify gradient episodic memory (a catastrophic forgetting alleviation approach) and Net2Net (a capacity expansion approach). Both models are proposed in the context of feedforward networks, and we evaluate the feasibility of using them for recurrent networks. Evaluation on the proposed benchmark shows that the unified model is more suitable than the constituent models for lifelong learning setting.

[1]  R. Solomonoff A SYSTEM FOR INCREMENTAL LEARNING BASED ON ALGORITHMIC PROBABILITY , 1989 .

[2]  Michael McCloskey,et al.  Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem , 1989 .

[3]  Ray J. Solomonofi,et al.  A SYSTEM FOR INCREMENTAL LEARNING BASED ON ALGORITHMIC PROBABILITY , 1989 .

[4]  Sebastian Thrun,et al.  Explanation-based neural network learning a lifelong learning approach , 1995 .

[5]  Sebastian Thrun,et al.  Explanation-based neural network learning , 1996 .

[6]  Sebastian Thrun,et al.  Lifelong Learning Algorithms , 1998, Learning to Learn.

[7]  Robert E. Mercer,et al.  The Task Rehearsal Method of Life-Long Learning: Overcoming Impoverished Data , 2002, Canadian Conference on AI.

[8]  Mark B. Ring CHILD: A First Step Towards Continual Learning , 1997, Machine Learning.

[9]  Olac Fuentes,et al.  Knowledge Transfer in Deep convolutional Neural Nets , 2007, Int. J. Artif. Intell. Tools.

[10]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[11]  Jason Weston,et al.  Curriculum learning , 2009, ICML '09.

[12]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Estevam R. Hruschka,et al.  Toward an Architecture for Never-Ending Language Learning , 2010, AAAI.

[14]  W. Dorn Duality in Quadratic Programming... , 2011 .

[15]  Gabriela Csurka,et al.  Metric Learning for Large Scale Image Classification: Generalizing to New Classes at Near-Zero Cost , 2012, ECCV.

[16]  Qiang Yang,et al.  Lifelong Machine Learning Systems: Beyond Learning Algorithms , 2013, AAAI Spring Symposium: Lifelong Machine Learning.

[17]  Alex Graves,et al.  Neural Turing Machines , 2014, ArXiv.

[18]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[19]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[20]  Yoshua Bengio,et al.  FitNets: Hints for Thin Deep Nets , 2014, ICLR.

[21]  Tianqi Chen,et al.  Net2Net: Accelerating Learning via Knowledge Transfer , 2015, ICLR.

[22]  Derek Hoiem,et al.  Learning Without Forgetting , 2016, ECCV.

[23]  Byoung-Tak Zhang,et al.  Overcoming Catastrophic Forgetting by Incremental Moment Matching , 2017, NIPS.

[24]  Davide Maltoni,et al.  CORe50: a New Dataset and Benchmark for Continuous Object Recognition , 2017, CoRL.

[25]  Marc'Aurelio Ranzato,et al.  Gradient Episodic Memory for Continual Learning , 2017, NIPS.

[26]  Christoph H. Lampert,et al.  iCaRL: Incremental Classifier and Representation Learning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Yoshua Bengio,et al.  Memory Augmented Neural Networks with Wormhole Connections , 2017, ArXiv.

[28]  Razvan Pascanu,et al.  Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[29]  Tinne Tuytelaars,et al.  Expert Gate: Lifelong Learning with a Network of Experts , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Surya Ganguli,et al.  Continual Learning Through Synaptic Intelligence , 2017, ICML.

[31]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[32]  Zachary Chase Lipton,et al.  Born Again Neural Networks , 2018, ICML.

[33]  Philip H. S. Torr,et al.  Riemannian Walk for Incremental Learning: Understanding Forgetting and Intransigence , 2018, ECCV.

[34]  Alexandros Karatzoglou,et al.  Overcoming Catastrophic Forgetting with Hard Attention to the Task , 2018 .

[35]  Svetlana Lazebnik,et al.  Piggyback: Adapting a Single Network to Multiple Tasks by Learning to Mask Weights , 2018, ECCV.

[36]  Joost van de Weijer,et al.  Rotate your Networks: Better Weight Consolidation and Less Catastrophic Forgetting , 2018, 2018 24th International Conference on Pattern Recognition (ICPR).