CODA-Prompt: COntinual Decomposed Attention-Based Prompting for Rehearsal-Free Continual Learning

Computer vision models suffer from a phenomenon known as catastrophic forgetting when learning novel concepts from continuously shifting training data. Typical solutions for this continual learning problem require extensive rehearsal of previously seen data, which increases memory costs and may violate data privacy. Recently, the emergence of large-scale pre-trained vision transformer models has enabled prompting approaches as an alternative to data-rehearsal. These approaches rely on a key-query mechanism to generate prompts and have been found to be highly resistant to catastrophic forgetting in the well-established rehearsal-free continual learning setting. However, the key mechanism of these methods is not trained end-to-end with the task sequence. Our experiments show that this leads to a reduction in their plasticity, hence sacrificing new task accuracy, and inability to benefit from expanded parameter capacity. We instead propose to learn a set of prompt components which are assembled with input-conditioned weights to produce input-conditioned prompts, resulting in a novel attention-based end-to-end key-query scheme. Our experiments show that we outperform the current SOTA method DualPrompt on established benchmarks by as much as 4.5% in average final accuracy. We also outperform the state of art by as much as 4.4% accuracy on a continual learning benchmark which contains both class-incremental and domain-incremental task shifts, corresponding to many practical settings. Our code is available at https://github.com/GT-RIPL/CODA-Prompt

[1]  Clayton D. Scott,et al.  IEEE Transactions on Pattern Analysis and Machine Intelligence , 2022, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Zhiwu Huang,et al.  S-Prompts Learning with Pre-trained Transformers: An Occam's Razor for Domain Incremental Learning , 2022, NeurIPS.

[3]  Jennifer G. Dy,et al.  DualPrompt: Complementary Prompting for Rehearsal-free Continual Learning , 2022, ECCV.

[4]  Z. Kira,et al.  A Closer Look at Rehearsal-Free Continual Learning * , 2022, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[5]  Bernard Ghanem,et al.  R-DFCIL: Relation-Guided Representation Learning for Data-Free Class Incremental Learning , 2022, ECCV.

[6]  Seyed Iman Mirzadeh,et al.  Architecture Matters in Continual Learning , 2022, ArXiv.

[7]  Yi Niu,et al.  Technical Report for ICCV 2021 Challenge SSLAD-Track3B: Transformers Are Better Continual Learners , 2022, 2201.04924.

[8]  Jennifer G. Dy,et al.  Learning to Prompt for Continual Learning , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  M. Cord,et al.  DyTox: Transformers for Continual Learning with DYnamic TOken eXpansion , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Hyunwoo J. Kim,et al.  Online Continual Learning in Image Classification: An Empirical Survey , 2021, Neurocomputing.

[11]  Yinpeng Chen,et al.  Improving Vision Transformers for Incremental Learning , 2021, ArXiv.

[12]  Hongxia Jin,et al.  Always Be Dreaming: A New Approach for Data-Free Class-Incremental Learning , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[13]  Jungwon Lee,et al.  Dual-Teacher Class-Incremental Learning With Data-Free Generative Replay , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[14]  Jihwan Bang,et al.  Rainbow Memory: Continual Learning with a Memory of Diverse Samples , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  S. Gelly,et al.  An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.

[16]  D. Song,et al.  The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[17]  Taesup Moon,et al.  SS-IL: Separated Softmax for Incremental Learning , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[18]  Hava T. Siegelmann,et al.  Brain-inspired replay for continual learning with artificial neural networks , 2020, Nature Communications.

[19]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[20]  Simone Calderara,et al.  Dark Experience for General Continual Learning: a Strong, Simple Baseline , 2020, NeurIPS.

[21]  Trevor Darrell,et al.  Adversarial Continual Learning , 2020, ECCV.

[22]  Junsoo Ha,et al.  A Neural Dirichlet Process Mixture Model for Task-Free Continual Learning , 2020, ICLR.

[23]  Derek Hoiem,et al.  Dreaming to Distill: Data-Free Knowledge Transfer via DeepInversion , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Tyler L. Hayes,et al.  Lifelong Machine Learning with Deep Streaming Linear Discriminant Analysis , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[25]  Vincenzo Lomonaco,et al.  Rehearsal-Free Continual Learning over Small Non-I.I.D. Batches , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[26]  Trevor Darrell,et al.  Uncertainty-guided Continual Learning with Bayesian Neural Networks , 2019, ICLR.

[27]  Benjamin F. Grewe,et al.  Continual learning with hypernetworks , 2019, ICLR.

[28]  Yee Whye Teh,et al.  Functional Regularisation for Continual Learning using Gaussian Processes , 2019, ICLR.

[29]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[30]  Alexandre Lacoste,et al.  Quantifying the Carbon Emissions of Machine Learning , 2019, ArXiv.

[31]  Dan Roth,et al.  Benchmarking Zero-shot Text Classification: Datasets, Evaluation and Entailment Approach , 2019, EMNLP.

[32]  Tinne Tuytelaars,et al.  Online Continual Learning with Maximally Interfered Retrieval , 2019, ArXiv.

[33]  Stefano Soatto,et al.  Toward Understanding Catastrophic Forgetting in Continual Learning , 2019, ArXiv.

[34]  Dahua Lin,et al.  Learning a Unified Classifier Incrementally via Rebalancing , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Yandong Guo,et al.  Large Scale Incremental Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Andreas S. Tolias,et al.  Three scenarios for continual learning , 2019, ArXiv.

[37]  Patrick Jähnichen,et al.  Learning to Remember: A Synaptic Plasticity Driven Framework for Continual Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Kibok Lee,et al.  Overcoming Catastrophic Forgetting With Unlabeled Data in the Wild , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[39]  Yoshua Bengio,et al.  Gradient based sample selection for online continual learning , 2019, NeurIPS.

[40]  Marc'Aurelio Ranzato,et al.  On Tiny Episodic Memories in Continual Learning , 2019 .

[41]  Marc'Aurelio Ranzato,et al.  Continual Learning with Tiny Episodic Memories , 2019, ArXiv.

[42]  David Filliat,et al.  Generative Models from the perspective of Continual Learning , 2018, 2019 International Joint Conference on Neural Networks (IJCNN).

[43]  Bo Wang,et al.  Moment Matching for Multi-Source Domain Adaptation , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[44]  David Rolnick,et al.  Experience Replay for Continual Learning , 2018, NeurIPS.

[45]  Marc'Aurelio Ranzato,et al.  Efficient Lifelong Learning with A-GEM , 2018, ICLR.

[46]  Nathan D. Cahill,et al.  Memory Efficient Experience Replay for Streaming Learning , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[47]  Davide Maltoni,et al.  Continuous Learning in Single-Incremental-Task Scenarios , 2018, Neural Networks.

[48]  Vaishnavh Nagarajan Theoretical Insights into Memorization in GANs , 2019 .

[49]  A. Stephen McGough,et al.  Predicting the Computational Cost of Deep Learning Models , 2018, 2018 IEEE International Conference on Big Data (Big Data).

[50]  Yen-Cheng Liu,et al.  Re-evaluating Continual Learning Scenarios: A Categorization and Case for Strong Baselines , 2018, ArXiv.

[51]  Andreas S. Tolias,et al.  Generative replay with feedback connections as a general strategy for continual learning , 2018, ArXiv.

[52]  Dahua Lin,et al.  Lifelong Learning via Progressive Distillation and Retrospection , 2018, ECCV.

[53]  Cordelia Schmid,et al.  End-to-End Incremental Learning , 2018, ECCV.

[54]  Ronald Kemker,et al.  FearNet: Brain-Inspired Model for Incremental Learning , 2017, ICLR.

[55]  Marcus Rohrbach,et al.  Memory Aware Synapses: Learning what (not) to forget , 2017, ECCV.

[56]  Ronald Kemker,et al.  Measuring Catastrophic Forgetting in Neural Networks , 2017, AAAI.

[57]  Derek Hoiem,et al.  Learning without Forgetting , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[58]  Yan Liu,et al.  Deep Generative Dual Memory Network for Continual Learning , 2017, ArXiv.

[59]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[60]  Alexander Gepperth,et al.  Incremental learning with self-organizing maps , 2017, 2017 12th International Workshop on Self-Organizing Maps and Learning Vector Quantization, Clustering and Data Visualization (WSOM).

[61]  Marc'Aurelio Ranzato,et al.  Gradient Episodic Memory for Continual Learning , 2017, NIPS.

[62]  Davide Maltoni,et al.  CORe50: a New Dataset and Benchmark for Continuous Object Recognition , 2017, CoRL.

[63]  Jiwon Kim,et al.  Continual Learning with Deep Generative Replay , 2017, NIPS.

[64]  Surya Ganguli,et al.  Continual Learning Through Synaptic Intelligence , 2017, ICML.

[65]  Razvan Pascanu,et al.  Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[66]  Christoph H. Lampert,et al.  iCaRL: Incremental Classifier and Representation Learning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[67]  Razvan Pascanu,et al.  Progressive Neural Networks , 2016, ArXiv.

[68]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[69]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[70]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[71]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[72]  M. Köppen,et al.  The Curse of Dimensionality , 2010 .

[73]  Anthony V. Robins,et al.  Catastrophic Forgetting, Rehearsal and Pseudorehearsal , 1995, Connect. Sci..