CaSpeR: Latent Spectral Regularization for Continual Learning

While biological intelligence grows organically as new knowledge is gathered throughout life, Artificial Neural Networks forget catastrophi-cally whenever they face a changing training data distribution. Rehearsal-based Continual Learning (CL) approaches have been established as a versatile and reliable solution to overcome this limitation; however, sudden input disruptions and memory constraints are known to alter the consistency of their predictions. We study this phenomenon by investigating the geometric char-acteristics of the learner’s latent space and find that replayed data points of different classes increasingly mix up, interfering with classification. Hence, we propose a geometric regularizer that enforces weak requirements on the Laplacian spectrum of the latent space, promoting a partitioning behavior. We show that our proposal, called Continual Spectral Regularizer (CaSpeR), can be easily combined with any rehearsal-based CL approach and improves the performance of SOTA methods on standard benchmarks. Finally, we conduct additional analysis to provide insights into CaSpeR’s effects and applicability.

[1]  S. Calderara,et al.  Class-Incremental Continual Learning Into the eXtended DER-Verse , 2022, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  T. Tuytelaars,et al.  Three types of incremental learning , 2022, Nat. Mac. Intell..

[3]  S. Calderara,et al.  On the Effectiveness of Lipschitz-Driven Rehearsal in Continual Learning , 2022, NeurIPS.

[4]  S. Hoi,et al.  Continual Learning: Fast and Slow , 2022, ArXiv.

[5]  S. Calderara,et al.  Continual semi-supervised learning through contrastive interpolation consistency , 2021, Pattern Recognit. Lett..

[6]  T. Tuytelaars,et al.  New Insights on Reducing Abrupt Representation Change in Online Continual Learning , 2021, International Conference on Learning Representations.

[7]  L. Guibas,et al.  Learning Spectral Unions of Partial Deformable 3D Shapes , 2021, Comput. Graph. Forum.

[8]  Tinne Tuytelaars,et al.  A Continual Learning Survey: Defying Forgetting in Classification Tasks , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Steven Hoi,et al.  Continual Learning, Fast and Slow , 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Jinwoo Shin,et al.  Co2L: Contrastive Continual Learning , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[11]  Simone Melzi,et al.  Universal Spectral Adversarial Attacks for Deformable Shapes , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Jihwan Bang,et al.  Rainbow Memory: Continual Learning with a Memory of Diverse Samples , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Vincent Gripon,et al.  Representing Deep Neural Networks Latent Space Geometries with Graphs , 2020, Algorithms.

[14]  Alberto Del Bimbo,et al.  Class-incremental Learning with Pre-allocated Fixed Classifiers , 2020, 2020 25th International Conference on Pattern Recognition (ICPR).

[15]  Simone Calderara,et al.  Rethinking Experience Replay: a Bag of Tricks for Continual Learning , 2020, 2020 25th International Conference on Pattern Recognition (ICPR).

[16]  Trevor Darrell,et al.  Remembering for the Right Reasons: Explanations Reduce Catastrophic Forgetting , 2020, ICLR.

[17]  Taesup Moon,et al.  SS-IL: Separated Softmax for Incremental Learning , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[18]  Albert Gordo,et al.  Using Hindsight to Anchor Past Knowledge in Continual Learning , 2019, AAAI.

[19]  Matthieu Cord,et al.  PODNet: Pooled Outputs Distillation for Small-Tasks Incremental Learning , 2020, ECCV.

[20]  Simone Calderara,et al.  Dark Experience for General Continual Learning: a Strong, Simple Baseline , 2020, NeurIPS.

[21]  R. Kimmel,et al.  LIMP: Learning Latent Shape Representations with Metric Preservation Priors , 2020, ECCV.

[22]  Maks Ovsjanikov,et al.  Instant recovery of shape from spectrum via latent space connections , 2020, 2020 International Conference on 3D Vision (3DV).

[23]  Yu Jin,et al.  Graph Coarsening with Preserved Spectral Properties , 2018, AISTATS.

[24]  Dahua Lin,et al.  Learning a Unified Classifier Incrementally via Rebalancing , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Yandong Guo,et al.  Large Scale Incremental Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Quoc V. Le,et al.  EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.

[27]  Yoshua Bengio,et al.  Gradient based sample selection for online continual learning , 2019, NeurIPS.

[28]  Marc'Aurelio Ranzato,et al.  On Tiny Episodic Memories in Continual Learning , 2019 .

[29]  Maks Ovsjanikov,et al.  Isospectralization, or How to Hear Shape, Style, and Correspondence , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Gerald Tesauro,et al.  Learning to Learn without Forgetting By Maximizing Transfer and Minimizing Interference , 2018, ICLR.

[31]  Stefan Wermter,et al.  Continual Lifelong Learning with Neural Networks: A Review , 2018, Neural Networks.

[32]  Stella X. Yu,et al.  Unsupervised Feature Learning via Non-parametric Instance Discrimination , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[33]  Yarin Gal,et al.  Towards Robust Evaluations of Continual Learning , 2018, ArXiv.

[34]  Yee Whye Teh,et al.  Progress & Compress: A scalable framework for continual learning , 2018, ICML.

[35]  David Barber,et al.  Online Structured Laplace Approximations For Overcoming Catastrophic Forgetting , 2018, NeurIPS.

[36]  Philip H. S. Torr,et al.  Riemannian Walk for Incremental Learning: Understanding Forgetting and Intransigence , 2018, ECCV.

[37]  Alexandros Karatzoglou,et al.  Overcoming Catastrophic Forgetting with Hard Attention to the Task , 2018 .

[38]  P. Thomas Fletcher,et al.  The Riemannian Geometry of Deep Generative Models , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[39]  Svetlana Lazebnik,et al.  PackNet: Adding Multiple Tasks to a Single Network by Iterative Pruning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[40]  Lars Kai Hansen,et al.  Latent Space Oddity: on the Curvature of Deep Generative Models , 2017, ICLR.

[41]  Derek Hoiem,et al.  Learning without Forgetting , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42]  Marc'Aurelio Ranzato,et al.  Gradient Episodic Memory for Continual Learning , 2017, NIPS.

[43]  Razvan Pascanu,et al.  Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[44]  Christoph H. Lampert,et al.  iCaRL: Incremental Classifier and Representation Learning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Daniel Cremers,et al.  Partial Functional Correspondence , 2017 .

[46]  Oriol Vinyals,et al.  Matching Networks for One Shot Learning , 2016, NIPS.

[47]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Luca Trevisan,et al.  Multi-way spectral partitioning and higher-order cheeger inequalities , 2011, STOC '12.

[49]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[50]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[51]  Anthony V. Robins,et al.  Catastrophic Forgetting, Rehearsal and Pseudorehearsal , 1995, Connect. Sci..

[52]  Michael McCloskey,et al.  Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem , 1989 .

[53]  Mark Jerrum,et al.  Approximate Counting, Uniform Generation and Rapidly Mixing Markov Chains , 1987, WG.

[54]  Jeffrey Scott Vitter,et al.  Random sampling with a reservoir , 1985, TOMS.

[55]  J. Cheeger A lower bound for the smallest eigenvalue of the Laplacian , 1969 .