Model Behavior Preserving for Class-Incremental Learning

Deep models have shown to be vulnerable to catastrophic forgetting, a phenomenon that the recognition performance on old data degrades when a pre-trained model is fine-tuned on new data. Knowledge distillation (KD) is a popular incremental approach to alleviate catastrophic forgetting. However, it usually fixes the absolute values of neural responses for isolated historical instances, without considering the intrinsic structure of the responses by a convolutional neural network (CNN) model. To overcome this limitation, we recognize the importance of the global property of the whole instance set and treat it as a behavior characteristic of a CNN model relevant to model incremental learning. On this basis: 1) we design an instance neighborhood-preserving (INP) loss to maintain the order of pair-wise instance similarities of the old model in the feature space; 2) we devise a label priority-preserving (LPP) loss to preserve the label ranking lists within instance-wise label probability vectors in the output space; and 3) we introduce an efficient derivable ranking algorithm for calculating the two loss functions. Extensive experiments conducted on CIFAR100 and ImageNet show that our approach achieves the state-of-the-art performance.