论文信息 - Efficient Lifelong Learning with A-GEM

Efficient Lifelong Learning with A-GEM

In lifelong learning, the learner is presented with a sequence of tasks, incrementally building a data-driven prior which may be leveraged to speed up learning of a new task. In this work, we investigate the efficiency of current lifelong approaches, in terms of sample complexity, computational and memory cost. Towards this end, we first introduce a new and a more realistic evaluation protocol, whereby learners observe each example only once and hyper-parameter selection is done on a small and disjoint set of tasks, which is not used for the actual learning experience and evaluation. Second, we introduce a new metric measuring how quickly a learner acquires a new skill. Third, we propose an improved version of GEM (Lopez-Paz & Ranzato, 2017), dubbed Averaged GEM (A-GEM), which enjoys the same or even better performance as GEM, while being almost as computationally and memory efficient as EWC (Kirkpatrick et al., 2016) and other regularization-based methods. Finally, we show that all algorithms including A-GEM can learn even more quickly if they are provided with task descriptors specifying the classification tasks under consideration. Our experiments on several standard lifelong learning benchmarks demonstrate that A-GEM has the best trade-off between accuracy and efficiency.

[1] Babak Saleh,et al. Write a Classifier: Predicting Visual Classifiers from Unstructured Text , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2] Christoph H. Lampert,et al. Learning to detect unseen object classes by between-class attribute transfer , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[3] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4] Tom Schaul,et al. Universal Value Function Approximators , 2015, ICML.

[5] Eric Eaton,et al. Using Task Features for Zero-Shot Knowledge Transfer in Lifelong Learning , 2016, IJCAI.

[6] Christoph H. Lampert,et al. Zero-Shot Learning—A Comprehensive Evaluation of the Good, the Bad and the Ugly , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7] Leslie Pack Kaelbling,et al. Modular meta-learning , 2018, CoRL.

[8] OctoMiao. Overcoming catastrophic forgetting in neural networks , 2016 .

[9] Allan Jabri,et al. CommAI: Evaluating the first steps towards a useful general AI , 2017, ICLR.

[10] Christoph H. Lampert,et al. iCaRL: Incremental Classifier and Representation Learning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11] Richard E. Turner,et al. Variational Continual Learning , 2017, ICLR.

[12] Tinne Tuytelaars,et al. Expert Gate: Lifelong Learning with a Network of Experts , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13] Matthew Riemer,et al. Routing Networks: Adaptive Selection of Non-linear Functions for Multi-Task Learning , 2017, ICLR.

[14] Sebastian Thrun,et al. Lifelong Learning Algorithms , 1998, Learning to Learn.

[15] Christoph H. Lampert,et al. Attribute-Based Classification for Zero-Shot Visual Object Categorization , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16] Yann LeCun,et al. The mnist database of handwritten digits , 2005 .

[17] Thomas L. Griffiths,et al. Automatically Composing Representation Transformations as a Means for Generalization , 2018, ICLR.

[18] Zhanxing Zhu,et al. Reinforced Continual Learning , 2018, NeurIPS.

[19] Mark B. Ring. CHILD: A First Step Towards Continual Learning , 1997, Machine Learning.

[20] Patrick M. Pilarski,et al. Horde: a scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction , 2011, AAMAS.

[21] Surya Ganguli,et al. Continual Learning Through Synaptic Intelligence , 2017, ICML.

[22] Philip H. S. Torr,et al. Riemannian Walk for Incremental Learning: Understanding Forgetting and Intransigence , 2018, ECCV.

[23] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .

[24] Chrisantha Fernando,et al. PathNet: Evolution Channels Gradient Descent in Super Neural Networks , 2017, ArXiv.

[25] Jiwon Kim,et al. Continual Learning with Deep Generative Replay , 2017, NIPS.

[26] Marc'Aurelio Ranzato,et al. Gradient Episodic Memory for Continual Learning , 2017, NIPS.

[27] Pietro Perona,et al. The Caltech-UCSD Birds-200-2011 Dataset , 2011 .

[28] Marcus Rohrbach,et al. Memory Aware Synapses: Learning what (not) to forget , 2017, ECCV.

[29] Yee Whye Teh,et al. Progress & Compress: A scalable framework for continual learning , 2018, ICML.

[30] Ji Zhang,et al. Large-Scale Visual Relationship Understanding , 2018, AAAI.