Distilling Causal Effect of Data in Class-Incremental Learning

We propose a causal framework to explain the catastrophic forgetting in Class-Incremental Learning (CIL) and then derive a novel distillation method that is orthogonal to the existing anti-forgetting techniques, such as data replay and feature/label distillation. We first 1) place CIL into the framework, 2) answer why the forgetting happens: the causal effect of the old data is lost in new training, and then 3) explain how the existing techniques mitigate it: they bring the causal effect back. Based on the causal framework, we propose to distill the Colliding Effect between the old and the new data, which is fundamentally equivalent to the causal effect of data replay, but without any cost of replay storage. Thanks to the causal effect analysis, we can further capture the Incremental Momentum Effect of the data stream, removing which can help to retain the old effect overwhelmed by the new data effect, and thus alleviate the forgetting of the old class in testing. Extensive experiments on three CIL benchmarks: CIFAR-100, ImageNet-Sub&Full, show that the proposed causal effect distillation can improve various state-of-the-art CIL methods by a large margin (0.72%–9.06%). 1

[1]  Yinghui Xu,et al.  Few-Shot Incremental Learning with Continually Evolved Classifiers , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Zhiwu Lu,et al.  Counterfactual VQA: A Cause-Effect Look at Language Bias , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Hanwang Zhang,et al.  Deconfounded Image Captioning: A Causal Retrospect , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Adaptive Aggregation Networks for Class-Incremental Learning Supplementary Materials , 2021 .

[5]  Hanwang Zhang,et al.  Long-Tailed Classification by Keeping the Good and Removing the Bad Momentum Causal Effect , 2020, Neural Information Processing Systems.

[6]  Xian-Sheng Hua,et al.  Interventional Few-Shot Learning , 2020, NeurIPS.

[7]  Jinhui Tang,et al.  Causal Intervention for Weakly-Supervised Semantic Segmentation , 2020, NeurIPS.

[8]  Xiaopeng Hong,et al.  Topology-Preserving Class-Incremental Learning , 2020, ECCV.

[9]  Hava T. Siegelmann,et al.  Brain-inspired replay for continual learning with artificial neural networks , 2020, Nature Communications.

[10]  Sijia Wang,et al.  GAN Memory with No Forgetting , 2020, NeurIPS.

[11]  Matthieu Cord,et al.  PODNet: Pooled Outputs Distillation for Small-Tasks Incremental Learning , 2020, ECCV.

[12]  Xiaopeng Hong,et al.  Few-Shot Class-Incremental Learning , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Hanwang Zhang,et al.  Visual Commonsense R-CNN , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Jianqiang Huang,et al.  Unbiased Scene Graph Generation From Biased Training , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Bernt Schiele,et al.  Mnemonics Training: Multi-Class Incremental Learning Without Forgetting , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Hanwang Zhang,et al.  Two Causal Principles for Improving Visual Dialog , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Derek Hoiem,et al.  Improving Confidence Estimates for Unfamiliar Examples , 2018, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  B. Schölkopf,et al.  Causality for Machine Learning , 2019, Probabilistic and Causal Inference.

[19]  Dahua Lin,et al.  Learning a Unified Classifier Incrementally via Rebalancing , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Yandong Guo,et al.  Large Scale Incremental Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Kibok Lee,et al.  Overcoming Catastrophic Forgetting With Unlabeled Data in the Wild , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[22]  Stefan Wermter,et al.  Continual Lifelong Learning with Neural Networks: A Review , 2018, Neural Networks.

[23]  Yen-Cheng Liu,et al.  Re-evaluating Continual Learning Scenarios: A Categorization and Case for Strong Baselines , 2018, ArXiv.

[24]  Cordelia Schmid,et al.  End-to-End Incremental Learning , 2018, ECCV.

[25]  Kibok Lee,et al.  A Simple Unified Framework for Detecting Out-of-Distribution Samples and Adversarial Attacks , 2018, NeurIPS.

[26]  Philip H. S. Torr,et al.  Riemannian Walk for Incremental Learning: Understanding Forgetting and Intransigence , 2018, ECCV.

[27]  Ronald Kemker,et al.  FearNet: Brain-Inspired Model for Incremental Learning , 2017, ICLR.

[28]  Svetlana Lazebnik,et al.  PackNet: Adding Multiple Tasks to a Single Network by Iterative Pruning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[29]  Derek Hoiem,et al.  Learning without Forgetting , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Yan Liu,et al.  Deep Generative Dual Memory Network for Continual Learning , 2017, ArXiv.

[31]  Ronald L. Davis,et al.  The Biology of Forgetting—A Perspective , 2017, Neuron.

[32]  Jiwon Kim,et al.  Continual Learning with Deep Generative Replay , 2017, NIPS.

[33]  Razvan Pascanu,et al.  Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[34]  Christoph H. Lampert,et al.  iCaRL: Incremental Classifier and Representation Learning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  J. Pearl,et al.  Causal Inference in Statistics: A Primer , 2016 .

[36]  OctoMiao Overcoming catastrophic forgetting in neural networks , 2016 .

[37]  G. Hesslow,et al.  Purkinje cell activity during classical conditioning with different conditional stimuli explains central tenet of Rescorla–Wagner model , 2015, Proceedings of the National Academy of Sciences.

[38]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[39]  Pietro Perona,et al.  Visual Causal Feature Learning , 2014, UAI.

[40]  J. Pearl Interpretation and Identification of Causal Mediation , 2013, Psychological methods.

[41]  Ilja Kuzborskij,et al.  From N to N+1: Multiclass Transfer Incremental Learning , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[42]  Geoffrey E. Hinton,et al.  On the importance of initialization and momentum in deep learning , 2013, ICML.

[43]  J. Ghosh Causality: Models, Reasoning and Inference, Second Edition by Judea Pearl , 2011 .

[44]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[45]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[46]  Gert Cauwenberghs,et al.  Incremental and Decremental Support Vector Machine Learning , 2000, NIPS.

[47]  Robert C. Williamson,et al.  An analysis of the exponentiated gradient descent algorithm , 1999, ISSPA '99. Proceedings of the Fifth International Symposium on Signal Processing and its Applications (IEEE Cat. No.99EX359).

[48]  R. French Catastrophic forgetting in connectionist networks , 1999, Trends in Cognitive Sciences.

[49]  Ning Qian,et al.  On the momentum term in gradient descent learning algorithms , 1999, Neural Networks.

[50]  Sebastian Thrun,et al.  Lifelong Learning Algorithms , 1998, Learning to Learn.

[51]  Anthony V. Robins,et al.  Catastrophic Forgetting, Rehearsal and Pseudorehearsal , 1995, Connect. Sci..