Initial Classifier Weights Replay for Memoryless Class Incremental Learning

Incremental Learning (IL) is useful when artificial systems need to deal with streams of data and do not have access to all data at all times. The most challenging setting requires a constant complexity of the deep model and an incremental model update without access to a bounded memory of past data. Then, the representations of past classes are strongly affected by catastrophic forgetting. To mitigate its negative effect, an adapted fine tuning which includes knowledge distillation is usually deployed. We propose a different approach based on a vanilla fine tuning backbone. It leverages initial classifier weights which provide a strong representation of past classes because they are trained with all class data. However, the magnitude of classifiers learned in different states varies and normalization is needed for a fair handling of all classes. Normalization is performed by standardizing the initial classifier weights, which are assumed to be normally distributed. In addition, a calibration of prediction scores is done by using state level statistics to further improve classification fairness. We conduct a thorough evaluation with four public datasets in a memoryless incremental learning setting. Results show that our method outperforms existing techniques by a large margin for large-scale datasets.

[1]  Michael McCloskey,et al.  Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem , 1989 .

[2]  Faisal Shafait,et al.  Revisiting Distillation and Incremental Classifier Learning , 2018, ACCV.

[3]  Cordelia Schmid,et al.  End-to-End Incremental Learning , 2018, ECCV.

[4]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[5]  Larry S. Davis,et al.  M2KD: Multi-model and Multi-level Knowledge Distillation for Incremental Learning , 2019, ArXiv.

[6]  Yun Xiang,et al.  Efficient Incremental Learning Using Dynamic Correction Vector , 2020, IEEE Access.

[7]  Wayne D. Gray,et al.  Basic objects in natural categories , 1976, Cognitive Psychology.

[8]  Marcus Rohrbach,et al.  Memory Aware Synapses: Learning what (not) to forget , 2017, ECCV.

[9]  Derek Hoiem,et al.  Learning without Forgetting , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Anthony V. Robins,et al.  Catastrophic Forgetting, Rehearsal and Pseudorehearsal , 1995, Connect. Sci..

[11]  Taesup Moon,et al.  A Simple Class Decision Balancing for Incremental Learning , 2020, ArXiv.

[12]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[13]  Benjamin F. Grewe,et al.  Continual learning with hypernetworks , 2019, ICLR.

[14]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Martial Hebert,et al.  Growing a Brain: Fine-Tuning by Increasing Model Capacity , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Shiguang Shan,et al.  Exemplar-Supported Generative Reproduction for Class Incremental Learning , 2018, BMVC.

[17]  Yu Qiao,et al.  Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks , 2016, IEEE Signal Processing Letters.

[18]  Adrian Popescu,et al.  ScaIL: Classifier Weights Scaling for Class Incremental Learning , 2020, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).

[19]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[20]  Dahua Lin,et al.  Learning a Unified Classifier Incrementally via Rebalancing , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[22]  Svetlana Lazebnik,et al.  Piggyback: Adapting a Single Network to Multiple Tasks by Learning to Mask Weights , 2018, ECCV.

[23]  Yandong Guo,et al.  Large Scale Incremental Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Gabriela Csurka,et al.  Distance-Based Image Classification: Generalizing to New Classes at Near-Zero Cost , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Tinne Tuytelaars,et al.  Expert Gate: Lifelong Learning with a Network of Experts , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Gert Cauwenberghs,et al.  Incremental and Decremental Support Vector Machine Learning , 2000, NIPS.

[27]  Omkar M. Parkhi,et al.  VGGFace2: A Dataset for Recognising Faces across Pose and Age , 2017, 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018).

[28]  Stefan Rüping,et al.  Incremental Learning with Support Vector Machines , 2001, ICDM.

[29]  Christoph H. Lampert,et al.  iCaRL: Incremental Classifier and Representation Learning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Svetlana Lazebnik,et al.  PackNet: Adding Multiple Tasks to a Single Network by Iterative Pruning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[31]  Adrian Popescu,et al.  IL2M: Class Incremental Learning With Dual Memory , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[32]  Bohyung Han,et al.  Large-Scale Image Retrieval with Attentive Deep Local Features , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).