DKT: Diverse Knowledge Transfer Transformer for Class Incremental Learning

In the context of incremental class learning, deep neural networks are prone to catastrophic forgetting, where the accuracy of old classes declines substantially as new knowledge is learned. While recent studies have sought to address this issue, most approaches suffer from either the stability-plasticity dilemma or excessive computational and parameter requirements. To tackle these challenges, we propose a novel framework, the Diverse Knowledge Transfer Trans-former (DKT), which incorporates two knowledge transfer mechanisms that use attention mechanisms to transfer both task-specific and task-general knowledge to the current task, along with a duplex classifier to address the stability-plasticity dilemma. Additionally, we design a loss function that clusters similar categories and discriminates be-tween old and new tasks in the feature space. The proposed method requires only a small number of extra parameters, which are negligible in comparison to the increasing number of tasks. We perform extensive experiments on CIFAR100, ImageNet100, and ImageNet1000 datasets, which demonstrate that our method outperforms other competitive methods and achieves state-of-the-art performance. Our source code is available at https://github.com/MIV-XJTU/DKT.

[1]  K. J. Joseph,et al.  Class-Incremental Learning with Cross-Space Clustering and Controlled Transfer , 2022, ECCV.

[2]  Dacheng Tao,et al.  Continual Learning with Lifelong Vision Transformer , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Jennifer G. Dy,et al.  DualPrompt: Complementary Prompting for Rehearsal-free Continual Learning , 2022, ECCV.

[4]  Fu Lee Wang,et al.  FOSTER: Feature Boosting and Compression for Class-Incremental Learning , 2022, ECCV.

[5]  Xiaopeng Hong,et al.  Identity-Quantity Harmonic Multi-Object Tracking , 2022, IEEE Transactions on Image Processing.

[6]  Wenming Tan,et al.  Scene-Adaptive Attention Network for Crowd Counting , 2021, ArXiv.

[7]  Jennifer G. Dy,et al.  Learning to Prompt for Continual Learning , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  M. Cord,et al.  DyTox: Transformers for Continual Learning with DYnamic TOken eXpansion , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  A. Piergiovanni,et al.  TokenLearner: What Can 8 Learned Tokens Do for Images and Videos? , 2021, ArXiv.

[10]  Yihong Gong,et al.  Few-Shot Class-Incremental Learning via Relation Knowledge Distillation , 2021, AAAI.

[11]  Hui Xue,et al.  Towards Robust Vision Transformer , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Sijia Liu,et al.  Preserving Earlier Knowledge in Continual Learning with the Help of All Previous Feature Extractors , 2021, ArXiv.

[13]  Ziqi Zhou,et al.  SoT: Delving Deeper into Classification Head for Transformer , 2021, 2104.10935.

[14]  Matthieu Cord,et al.  Going deeper with Image Transformers , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[15]  Alexander Mathis,et al.  End-to-End Trainable Multi-Instance Pose Estimation with Transformers , 2021, ArXiv.

[16]  Matthieu Cord,et al.  Training data-efficient image transformers & distillation through attention , 2020, ICML.

[17]  S. Gelly,et al.  An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.

[18]  Matthieu Cord,et al.  PODNet: Pooled Outputs Distillation for Small-Tasks Incremental Learning , 2020, ECCV.

[19]  Xiaopeng Hong,et al.  Few-Shot Class-Incremental Learning , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Juyong Zhang,et al.  AANet: Adaptive Aggregation Network for Efficient Stereo Matching , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Simone Calderara,et al.  Conditional Channel Gated Networks for Task-Aware Continual Learning , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Yihong Gong,et al.  Multi-Target Multi-Camera Tracking by Tracklet-to-Target Assignment , 2020, IEEE Transactions on Image Processing.

[23]  Fahad Shahbaz Khan,et al.  Random Path Selection for Continual Learning , 2019, NeurIPS.

[24]  Shutao Xia,et al.  Maintaining Discrimination and Fairness in Class Incremental Learning , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Dahua Lin,et al.  Learning a Unified Classifier Incrementally via Rebalancing , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Yandong Guo,et al.  Large Scale Incremental Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Cordelia Schmid,et al.  End-to-End Incremental Learning , 2018, ECCV.

[28]  Alexandros Karatzoglou,et al.  Overcoming catastrophic forgetting with hard attention to the task , 2018, ICML.

[29]  Svetlana Lazebnik,et al.  PackNet: Adding Multiple Tasks to a Single Network by Iterative Pruning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[30]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[31]  Andrei A. Rusu,et al.  Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[32]  Christoph H. Lampert,et al.  iCaRL: Incremental Classifier and Representation Learning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Derek Hoiem,et al.  Learning without Forgetting , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[36]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[37]  R Ratcliff,et al.  Connectionist models of recognition memory: constraints imposed by learning and forgetting functions. , 1990, Psychological review.

[38]  Stephen Grossberg,et al.  A massively parallel architecture for a self-organizing neural pattern recognition machine , 1988, Comput. Vis. Graph. Image Process..

[39]  Ping Luo,et al.  Dynamic Token Normalization Improves Vision Transformer , 2021, ArXiv.

[40]  Stephen Lin,et al.  Swin Transformer: Hierarchical Vision Transformer using Shifted Windows , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[41]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[42]  Alec Radford,et al.  Improving Language Understanding by Generative Pre-Training , 2018 .

[43]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[44]  Michael McCloskey,et al.  Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem , 1989 .