Continuous Learning in Single-Incremental-Task Scenarios

It was recently shown that architectural, regularization and rehearsal strategies can be used to train deep models sequentially on a number of disjoint tasks without forgetting previously acquired knowledge. However, these strategies are still unsatisfactory if the tasks are not disjoint but constitute a single incremental task (e.g., class-incremental learning). In this paper we point out the differences between multi-task and single-incremental-task scenarios and show that well-known approaches such as LWF, EWC and SI are not ideal for incremental task scenarios. A new approach, denoted as AR1, combining architectural and regularization strategies is then specifically proposed. AR1 overhead (in terms of memory and computation) is very small thus making it suitable for online learning. When tested on CORe50 and iCIFAR-100, AR1 outperformed existing regularization strategies by a good margin.

[1]  Razvan Pascanu,et al.  Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[2]  Michael McCloskey,et al.  Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem , 1989 .

[3]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[4]  R. French Catastrophic Forgetting in Connectionist Networks , 2006 .

[5]  Ferenc Huszár Note on the quadratic penalties in elastic weight consolidation , 2018, Proceedings of the National Academy of Sciences.

[6]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Derek Hoiem,et al.  Learning without Forgetting , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[9]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[10]  Andrei A. Rusu,et al.  Reply to Huszár: The elastic weight consolidation penalty is empirically valid , 2018, Proceedings of the National Academy of Sciences.

[11]  Marcus Rohrbach,et al.  Memory Aware Synapses: Learning what (not) to forget , 2017, ECCV.

[12]  Stephen Grossberg,et al.  Adaptive Resonance Theory: How a brain learns to consciously attend, learn, and recognize a changing world , 2013, Neural Networks.

[13]  Martial Hebert,et al.  Growing a Brain: Fine-Tuning by Increasing Model Capacity , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  R Ratcliff,et al.  Connectionist models of recognition memory: constraints imposed by learning and forgetting functions. , 1990, Psychological review.

[15]  Stefan Wermter,et al.  Lifelong Learning of Spatiotemporal Representations With Dual-Memory Recurrent Self-Organization , 2018, Front. Neurorobot..

[16]  Qiang Chen,et al.  Network In Network , 2013, ICLR.

[17]  Yoshua Bengio,et al.  An Empirical Investigation of Catastrophic Forgeting in Gradient-Based Neural Networks , 2013, ICLR.

[18]  Nathan D. Cahill,et al.  Memory Efficient Experience Replay for Streaming Learning , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[19]  Stefan Wermter,et al.  Continual Lifelong Learning with Neural Networks: A Review , 2018, Neural Networks.

[20]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[21]  Ronald Kemker,et al.  FearNet: Brain-Inspired Model for Incremental Learning , 2017, ICLR.

[22]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Surya Ganguli,et al.  Continual Learning Through Synaptic Intelligence , 2017, ICML.

[24]  Ronald Kemker,et al.  Measuring Catastrophic Forgetting in Neural Networks , 2017, AAAI.

[25]  Christoph H. Lampert,et al.  iCaRL: Incremental Classifier and Representation Learning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Davide Maltoni,et al.  Semi-supervised tuning from temporal coherence , 2015, 2016 23rd International Conference on Pattern Recognition (ICPR).

[27]  Marc'Aurelio Ranzato,et al.  Gradient Episodic Memory for Continual Learning , 2017, NIPS.

[28]  Davide Maltoni,et al.  CORe50: a New Dataset and Benchmark for Continuous Object Recognition , 2017, CoRL.