Gradient Episodic Memory for Continual Learning

One major obstacle towards AI is the poor ability of models to solve new problems quicker, and without forgetting previously acquired knowledge. To better understand this issue, we study the problem of continual learning, where the model observes, once and one by one, examples concerning a sequence of tasks. First, we propose a set of metrics to evaluate models learning over a continuum of data. These metrics characterize models not only by their test accuracy, but also in terms of their ability to transfer knowledge across tasks. Second, we propose a model for continual learning, called Gradient Episodic Memory (GEM) that alleviates forgetting, while allowing beneficial transfer of knowledge to previous tasks. Our experiments on variants of the MNIST and CIFAR-100 datasets demonstrate the strong performance of GEM when compared to the state-of-the-art.

[1]  Chrisantha Fernando,et al.  PathNet: Evolution Channels Gradient Descent in Super Neural Networks , 2017, ArXiv.

[2]  Santosh S. Vempala,et al.  Efficient Representations for Lifelong Learning and Autoencoding , 2014, COLT.

[3]  Ludovic Denoyer Deep Sequential Neural Networks , 2014 .

[4]  Jonas Peters,et al.  Causal inference by using invariant prediction: identification and confidence intervals , 2015, 1501.01332.

[5]  Razvan Pascanu,et al.  Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[6]  Jonathan Baxter,et al.  A Model of Inductive Bias Learning , 2000, J. Artif. Intell. Res..

[7]  Michael McCloskey,et al.  Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem , 1989 .

[8]  Koby Crammer,et al.  A theory of learning from different domains , 2010, Machine Learning.

[9]  Marc'Aurelio Ranzato,et al.  Learning Factored Representations in a Deep Mixture of Experts , 2013, ICLR.

[10]  Mark B. Ring CHILD: A First Step Towards Continual Learning , 1997, Machine Learning.

[11]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Oriol Vinyals,et al.  Matching Networks for One Shot Learning , 2016, NIPS.

[13]  Sebastian Thrun,et al.  A lifelong learning perspective for mobile robot control , 1994, Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS'94).

[14]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[15]  W. Dorn Duality in Quadratic Programming... , 2011 .

[16]  Ivan Laptev,et al.  Learning and Transferring Mid-level Image Representations Using Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[18]  Sebastian Thrun,et al.  Is Learning The n-th Thing Any Easier Than Learning The First? , 1995, NIPS.

[19]  Eric Eaton,et al.  ELLA: An Efficient Lifelong Learning Algorithm , 2013, ICML.

[20]  Mark B. Ring Continual learning in reinforcement environments , 1995, GMD-Bericht.

[21]  Pietro Perona,et al.  A Bayesian approach to unsupervised one-shot learning of object categories , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[22]  Razvan Pascanu,et al.  Progressive Neural Networks , 2016, ArXiv.

[23]  Tom Schaul,et al.  Universal Value Function Approximators , 2015, ICML.

[24]  Sebastian Thrun,et al.  Lifelong Learning Algorithms , 1998, Learning to Learn.

[25]  Tomas Mikolov,et al.  A Roadmap Towards Machine Intelligence , 2015, CICLing.

[26]  Estevam R. Hruschka,et al.  Toward an Architecture for Never-Ending Language Learning , 2010, AAAI.

[27]  Christoph H. Lampert,et al.  Learning to detect unseen object classes by between-class attribute transfer , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Allan Jabri,et al.  CommAI: Evaluating the first steps towards a useful general AI , 2017, ICLR.

[29]  Christoph H. Lampert,et al.  iCaRL: Incremental Classifier and Representation Learning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Junmo Kim,et al.  Less-forgetting Learning in Deep Neural Networks , 2016, ArXiv.

[31]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[32]  R. French Catastrophic forgetting in connectionist networks , 1999, Trends in Cognitive Sciences.

[33]  Luca Bertinetto,et al.  Learning feed-forward one-shot learners , 2016, NIPS.

[34]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[35]  C A Nelson,et al.  Learning to Learn , 2017, Encyclopedia of Machine Learning and Data Mining.

[36]  Rich Caruana,et al.  Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[37]  Daan Wierstra,et al.  One-shot Learning with Memory-Augmented Neural Networks , 2016, ArXiv.

[38]  Derek Hoiem,et al.  Learning without Forgetting , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39]  Andreas Krause,et al.  Training Mixture Models at Scale via Coresets , 2017 .

[40]  Surya Ganguli,et al.  Improved multitask learning through synaptic intelligence , 2017, ArXiv.

[41]  R Ratcliff,et al.  Connectionist models of recognition memory: constraints imposed by learning and forgetting functions. , 1990, Psychological review.

[42]  Geoffrey E. Hinton,et al.  Zero-shot Learning with Semantic Output Codes , 2009, NIPS.

[43]  James L. McClelland,et al.  Why there are complementary learning systems in the hippocampus and neocortex: insights from the successes and failures of connectionist models of learning and memory. , 1995, Psychological review.

[44]  Tinne Tuytelaars,et al.  Expert Gate: Lifelong Learning with a Network of Experts , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Patrick M. Pilarski,et al.  Horde: a scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction , 2011, AAMAS.

[46]  Christoph H. Lampert,et al.  Curriculum learning of multiple tasks , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[48]  Ruth Urner,et al.  Lifelong Learning with Weighted Majority Votes , 2016, NIPS.