Efficient Lifelong Learning with A-GEM

In lifelong learning, the learner is presented with a sequence of tasks, incrementally building a data-driven prior which may be leveraged to speed up learning of a new task. In this work, we investigate the efficiency of current lifelong approaches, in terms of sample complexity, computational and memory cost. Towards this end, we first introduce a new and a more realistic evaluation protocol, whereby learners observe each example only once and hyper-parameter selection is done on a small and disjoint set of tasks, which is not used for the actual learning experience and evaluation. Second, we introduce a new metric measuring how quickly a learner acquires a new skill. Third, we propose an improved version of GEM (Lopez-Paz & Ranzato, 2017), dubbed Averaged GEM (A-GEM), which enjoys the same or even better performance as GEM, while being almost as computationally and memory efficient as EWC (Kirkpatrick et al., 2016) and other regularization-based methods. Finally, we show that all algorithms including A-GEM can learn even more quickly if they are provided with task descriptors specifying the classification tasks under consideration. Our experiments on several standard lifelong learning benchmarks demonstrate that A-GEM has the best trade-off between accuracy and efficiency.

[1]  Babak Saleh,et al.  Write a Classifier: Predicting Visual Classifiers from Unstructured Text , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Christoph H. Lampert,et al.  Learning to detect unseen object classes by between-class attribute transfer , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Tom Schaul,et al.  Universal Value Function Approximators , 2015, ICML.

[5]  Eric Eaton,et al.  Using Task Features for Zero-Shot Knowledge Transfer in Lifelong Learning , 2016, IJCAI.

[6]  Christoph H. Lampert,et al.  Zero-Shot Learning—A Comprehensive Evaluation of the Good, the Bad and the Ugly , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Leslie Pack Kaelbling,et al.  Modular meta-learning , 2018, CoRL.

[8]  OctoMiao Overcoming catastrophic forgetting in neural networks , 2016 .

[9]  Allan Jabri,et al.  CommAI: Evaluating the first steps towards a useful general AI , 2017, ICLR.

[10]  Christoph H. Lampert,et al.  iCaRL: Incremental Classifier and Representation Learning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Richard E. Turner,et al.  Variational Continual Learning , 2017, ICLR.

[12]  Tinne Tuytelaars,et al.  Expert Gate: Lifelong Learning with a Network of Experts , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Matthew Riemer,et al.  Routing Networks: Adaptive Selection of Non-linear Functions for Multi-Task Learning , 2017, ICLR.

[14]  Sebastian Thrun,et al.  Lifelong Learning Algorithms , 1998, Learning to Learn.

[15]  Christoph H. Lampert,et al.  Attribute-Based Classification for Zero-Shot Visual Object Categorization , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Yann LeCun,et al.  The mnist database of handwritten digits , 2005 .

[17]  Thomas L. Griffiths,et al.  Automatically Composing Representation Transformations as a Means for Generalization , 2018, ICLR.

[18]  Zhanxing Zhu,et al.  Reinforced Continual Learning , 2018, NeurIPS.

[19]  Mark B. Ring CHILD: A First Step Towards Continual Learning , 1997, Machine Learning.

[20]  Patrick M. Pilarski,et al.  Horde: a scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction , 2011, AAMAS.

[21]  Surya Ganguli,et al.  Continual Learning Through Synaptic Intelligence , 2017, ICML.

[22]  Philip H. S. Torr,et al.  Riemannian Walk for Incremental Learning: Understanding Forgetting and Intransigence , 2018, ECCV.

[23]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[24]  Chrisantha Fernando,et al.  PathNet: Evolution Channels Gradient Descent in Super Neural Networks , 2017, ArXiv.

[25]  Jiwon Kim,et al.  Continual Learning with Deep Generative Replay , 2017, NIPS.

[26]  Marc'Aurelio Ranzato,et al.  Gradient Episodic Memory for Continual Learning , 2017, NIPS.

[27]  Pietro Perona,et al.  The Caltech-UCSD Birds-200-2011 Dataset , 2011 .

[28]  Marcus Rohrbach,et al.  Memory Aware Synapses: Learning what (not) to forget , 2017, ECCV.

[29]  Yee Whye Teh,et al.  Progress & Compress: A scalable framework for continual learning , 2018, ICML.

[30]  Ji Zhang,et al.  Large-Scale Visual Relationship Understanding , 2018, AAAI.