Mixture Uniform Distribution Modeling and Asymmetric Mix Distillation for Class Incremental Learning

Exemplar rehearsal-based methods with knowledge distillation (KD) have been widely used in class incremental learning (CIL) scenarios. However, they still suffer from performance degradation because of severely distribution discrepancy between training and test set caused by the limited storage memory on previous classes. In this paper, we mathematically model the data distribution and the discrepancy at the incremental stages with mixture uniform distribution (MUD). Then, we propose the asymmetric mix distillation method to uniformly minimize the error of each class from distribution discrepancy perspective. Specifically, we firstly promote mixup in CIL scenarios with the incremental mix samplers and incremental mix factor to calibrate the raw training data distribution. Next, mix distillation label augmentation is incorporated into the data distribution to inherit the knowledge information from the previous models. Based on the above augmented data distribution, our trained model effectively alleviates the performance degradation and extensive experimental results validate that our method exhibits superior performance on CIL benchmarks.

[1]  B. Schiele,et al.  RMM: Reinforced Memory Management for Class-Incremental Learning , 2023, NeurIPS.

[2]  Shan Yu,et al.  CKDF: Cascaded Knowledge Distillation Framework for Robust Incremental Learning , 2022, IEEE Transactions on Image Processing.

[3]  Bohyung Han,et al.  Class-Incremental Learning by Knowledge Distillation with Adaptive Feature Consolidation , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  M. Cord,et al.  DyTox: Transformers for Continual Learning with DYnamic TOken eXpansion , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Zhengzhuo Xu,et al.  Towards Calibrated Model for Long-Tailed Visual Recognition from Prior Perspective , 2021, NeurIPS.

[6]  De-Chuan Zhan,et al.  Co-Transport for Class-Incremental Learning , 2021, ACM Multimedia.

[7]  Xilin Chen,et al.  A Tale of Two CILs: The Connections between Class Incremental Learning and Class Imbalanced Learning, and Beyond , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[8]  Ioannis Kanellos,et al.  A Comprehensive Study of Class Incremental Learning Algorithms for Visual Tasks , 2020, Neural Networks.

[9]  James Y. Zou,et al.  How Does Mixup Help With Robustness and Generalization? , 2020, ICLR.

[10]  Matthieu Cord,et al.  PODNet: Pooled Outputs Distillation for Small-Tasks Incremental Learning , 2020, ECCV.

[11]  Taesup Moon,et al.  SS-IL: Separated Softmax for Incremental Learning , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[12]  Shutao Xia,et al.  Maintaining Discrimination and Fairness in Class Incremental Learning , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Saining Xie,et al.  Decoupling Representation and Classifier for Long-Tailed Recognition , 2019, ICLR.

[14]  Yi-Ming Chan,et al.  Compacting, Picking and Growing for Unforgetting Continual Learning , 2019, NeurIPS.

[15]  Tinne Tuytelaars,et al.  A Continual Learning Survey: Defying Forgetting in Classification Tasks , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Dahua Lin,et al.  Learning a Unified Classifier Incrementally via Rebalancing , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Yandong Guo,et al.  Large Scale Incremental Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Seong Joon Oh,et al.  CutMix: Regularization Strategy to Train Strong Classifiers With Localizable Features , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[19]  Andreas S. Tolias,et al.  Three scenarios for continual learning , 2019, ArXiv.

[20]  Marc'Aurelio Ranzato,et al.  Continual Learning with Tiny Episodic Memories , 2019, ArXiv.

[21]  Cordelia Schmid,et al.  End-to-End Incremental Learning , 2018, ECCV.

[22]  Ioannis Mitliagkas,et al.  Manifold Mixup: Better Representations by Interpolating Hidden States , 2018, ICML.

[23]  Shaogang Gong,et al.  Imbalanced Deep Learning by Minority Class Incremental Rectification , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Svetlana Lazebnik,et al.  Piggyback: Adapting a Single Network to Multiple Tasks by Learning to Mask Weights , 2018, ECCV.

[25]  Marcus Rohrbach,et al.  Memory Aware Synapses: Learning what (not) to forget , 2017, ECCV.

[26]  Hongyi Zhang,et al.  mixup: Beyond Empirical Risk Minimization , 2017, ICLR.

[27]  Jiwon Kim,et al.  Continual Learning with Deep Generative Replay , 2017, NIPS.

[28]  Surya Ganguli,et al.  Continual Learning Through Synaptic Intelligence , 2017, ICML.

[29]  Andrei A. Rusu,et al.  Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[30]  Christoph H. Lampert,et al.  iCaRL: Incremental Classifier and Representation Learning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Derek Hoiem,et al.  Learning without Forgetting , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[34]  Yoshua Bengio,et al.  An Empirical Investigation of Catastrophic Forgeting in Gradient-Based Neural Networks , 2013, ICLR.

[35]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[36]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[37]  Vladimir Vapnik,et al.  Principles of Risk Minimization for Learning Theory , 1991, NIPS.

[38]  Xu-Yao Zhang,et al.  Class-Incremental Learning via Dual Augmentation , 2021, NeurIPS.

[39]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[40]  Jason Weston,et al.  Vicinal Risk Minimization , 2000, NIPS.

[41]  Michael McCloskey,et al.  Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem , 1989 .