FFNB: Forgetting-Free Neural Blocks for Deep Continual Visual Learning

Deep neural networks (DNNs) have recently achieved a great success in computer vision and several related fields. Despite such progress, current neural architectures still suffer from catastrophic interference (a.k.a. forgetting) which obstructs DNNs to learn continually. While several state-of-the-art methods have been proposed to mitigate forgetting, these existing solutions are either highly rigid (as regularization) or time/memory demanding (as replay). An intermediate class of methods, based on dynamic networks, has been proposed in the literature and provides a reasonable balance between task memorization and computational footprint. In this paper, we devise a dynamic network architecture for continual learning based on a novel forgetting-free neural block (FFNB). Training FFNB features on new tasks is achieved using a novel procedure that constrains the underlying parameters in the null-space of the previous tasks, while training classifier parameters equates to Fisher discriminant analysis. The latter provides an effective incremental process which is also optimal from a Bayesian perspective. The trained features and classifiers are further enhanced using an incremental “end-to-end” fine-tuning. Extensive experiments, conducted on different challenging classification problems, show the high effectiveness of the proposed method.

[1]  Marc'Aurelio Ranzato,et al.  Gradient Episodic Memory for Continual Learning , 2017, NIPS.

[2]  Alberto Del Bimbo,et al.  Context-Dependent Logo Matching and Recognition , 2013, IEEE Transactions on Image Processing.

[3]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[4]  Richard Socher,et al.  Learn to Grow: A Continual Structure Learning Framework for Overcoming Catastrophic Forgetting , 2019, ICML.

[5]  Hichem Sahbi,et al.  Fuzzy Clustering: Consistency of Entropy Regularization , 2004, Fuzzy Days.

[6]  Hichem Sahbi,et al.  Transductive Kernel Map Learning and Its Application Image Annotation , 2012, BMVC.

[7]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[8]  Fahad Shahbaz Khan,et al.  Random Path Selection for Continual Learning , 2019, NeurIPS.

[9]  Kibok Lee,et al.  Overcoming Catastrophic Forgetting With Unlabeled Data in the Wild , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[10]  Yandong Guo,et al.  Large Scale Incremental Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Dahua Lin,et al.  Learning a Unified Classifier Incrementally via Rebalancing , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Hichem Sahbi,et al.  Deep Temporal Pyramid Design for Action Recognition , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[13]  Christoph H. Lampert,et al.  iCaRL: Incremental Classifier and Representation Learning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Quentin Oliveau,et al.  Learning Attribute Representations for Remote Sensing Ship Category Classification , 2017, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[15]  Conrad D. James,et al.  Neurogenesis deep learning: Extending deep networks to accommodate new classes , 2016, 2017 International Joint Conference on Neural Networks (IJCNN).

[16]  Sebastian Thrun,et al.  A lifelong learning perspective for mobile robot control , 1994, Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS'94).

[17]  Dimitris Samaras,et al.  Two-person interaction detection using body-pose features and multiple instance learning , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[18]  Larry P. Heck,et al.  Class-incremental Learning via Deep Model Consolidation , 2019, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).

[19]  Stephen Grossberg,et al.  A massively parallel architecture for a self-organizing neural pattern recognition machine , 1988, Comput. Vis. Graph. Image Process..

[20]  Hichem Sahbi,et al.  Using entropy for image and video authentication watermarks , 2006, Electronic Imaging.

[21]  Aaron C. Courville,et al.  Improved Training of Wasserstein GANs , 2017, NIPS.

[22]  Hichem Sahbi,et al.  Nonlinear Deep Kernel Learning for Image Annotation , 2017, IEEE Transactions on Image Processing.

[23]  Hichem Sahbi,et al.  Directed Acyclic Graph Kernels for Action Recognition , 2013, 2013 IEEE International Conference on Computer Vision.

[24]  Yan Liu,et al.  Deep Generative Dual Memory Network for Continual Learning , 2017, ArXiv.

[25]  Hichem Sahbi,et al.  Context-Dependent Kernels for Object Classification , 2011, IEEE Trans. Pattern Anal. Mach. Intell..

[26]  Hichem Sahbi,et al.  A Hierarchy of Support Vector Machines for Pattern Detection , 2006, J. Mach. Learn. Res..

[27]  Hichem Sahbi,et al.  Nonlinear Cross-View Sample Enrichment for Action Recognition , 2014, ECCV Workshops.

[28]  Hichem Sahbi,et al.  Kernel PCA for similarity invariant shape recognition , 2007, Neurocomputing.

[29]  Svetlana Lazebnik,et al.  PackNet: Adding Multiple Tasks to a Single Network by Iterative Pruning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[30]  Hichem Sahbi,et al.  Laplacian deep kernel learning for image annotation , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[31]  Hichem Sahbi,et al.  High Order Stochastic Graphlet Embedding for Graph-Based Pattern Recognition , 2017, ArXiv.

[32]  Hichem Sahbi,et al.  Robust matching and recognition using context-dependent kernels , 2008, ICML '08.

[33]  Alexandros Karatzoglou,et al.  Overcoming Catastrophic Forgetting with Hard Attention to the Task , 2018 .

[34]  Marc'Aurelio Ranzato,et al.  Efficient Lifelong Learning with A-GEM , 2018, ICLR.

[35]  Hichem Sahbi,et al.  Robust matching by dynamic space warping for accurate face recognition , 2001, Proceedings 2001 International Conference on Image Processing (Cat. No.01CH37205).

[36]  Barbara Hammer,et al.  Incremental learning algorithms and applications , 2016, ESANN.

[37]  Hichem Sahbi,et al.  Learning Laplacians in Chebyshev Graph Convolutional Networks , 2021, 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW).

[38]  Ronald Kemker,et al.  FearNet: Brain-Inspired Model for Incremental Learning , 2017, ICLR.

[39]  A. Marino,et al.  2010 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM , 2010, IGARSS 2010.

[40]  Xuming He,et al.  DER: Dynamically Expandable Representation for Class Incremental Learning , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Hichem Sahbi,et al.  Spatio-temporal interaction for aerial video change detection , 2012, 2012 IEEE International Geoscience and Remote Sensing Symposium.

[42]  Shutao Xia,et al.  Maintaining Discrimination and Fairness in Class Incremental Learning , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[44]  F. Fleuret,et al.  Scale-Invariance of Support Vector Machines based on the Triangular Kernel , 2001 .

[45]  Mark B. Ring Continual learning in reinforcement environments , 1995, GMD-Bericht.

[46]  Chunyan Miao,et al.  Distilling Causal Effect of Data in Class-Incremental Learning , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Hichem Sahbi,et al.  Multi-view object matching and tracking using canonical correlation analysis , 2009, 2009 16th IEEE International Conference on Image Processing (ICIP).

[48]  Hichem Sahbi,et al.  Designing relevant features for visual speech recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[49]  Philip S. Yu,et al.  A Comprehensive Survey on Graph Neural Networks , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[50]  Hichem Sahbi,et al.  Mid-level features and spatio-temporal context for activity recognition , 2012, Pattern Recognit..

[51]  Cordelia Schmid,et al.  End-to-End Incremental Learning , 2018, ECCV.

[52]  Hichem Sahbi,et al.  From coarse to fine skin and face detection , 2000, ACM Multimedia.

[53]  Jiwon Kim,et al.  Continual Learning with Deep Generative Replay , 2017, NIPS.

[54]  Sung Ju Hwang,et al.  Lifelong Learning with Dynamically Expandable Networks , 2017, ICLR.

[55]  Hichem Sahbi ImageCLEF annotation with explicit context-aware kernel maps , 2015, International Journal of Multimedia Information Retrieval.

[56]  Surya Ganguli,et al.  Continual Learning Through Synaptic Intelligence , 2017, ICML.

[57]  Hichem Sahbi,et al.  Manifold learning using robust Graph Laplacian for interactive image search , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[58]  Hichem Sahbi,et al.  Camera pose estimation using Visual Servoing for aerial video change detection , 2012, 2012 IEEE International Geoscience and Remote Sensing Symposium.

[59]  Hichem Sahbi,et al.  Deep kernel map networks for image annotation , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[60]  Derek Hoiem,et al.  Learning without Forgetting , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[61]  Martial Mermillod,et al.  The stability-plasticity dilemma: investigating the continuum from catastrophic forgetting to age-limited learning effects , 2013, Front. Psychol..

[62]  Mohammad Rostami,et al.  Generative Continual Concept Learning , 2019, AAAI.

[63]  Hichem Sahbi,et al.  A particular Gaussian mixture model for clustering and its application to image retrieval , 2008, Soft Comput..

[64]  Richard E. Turner,et al.  Variational Continual Learning , 2017, ICLR.

[65]  Razvan Pascanu,et al.  Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[66]  Joost van de Weijer,et al.  Rotate your Networks: Better Weight Consolidation and Less Catastrophic Forgetting , 2018, 2018 24th International Conference on Pattern Recognition (ICPR).

[67]  King-Sun Fu,et al.  IEEE Transactions on Pattern Analysis and Machine Intelligence Publication Information , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[68]  Byoung-Tak Zhang,et al.  Overcoming Catastrophic Forgetting by Incremental Moment Matching , 2017, NIPS.

[69]  Hichem Sahbi,et al.  Deep representation design from deep kernel networks , 2019, Pattern Recognit..

[70]  Sebastian Thrun,et al.  Is Learning The n-th Thing Any Easier Than Learning The First? , 1995, NIPS.

[71]  Chrisantha Fernando,et al.  PathNet: Evolution Channels Gradient Descent in Super Neural Networks , 2017, ArXiv.

[72]  Hichem Sahbi,et al.  Coarse-to-Fine Deep Kernel Networks , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[73]  Quoc V. Le,et al.  EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.

[74]  Rama Chellappa,et al.  Learning Without Memorizing , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[75]  Hichem Sahbi,et al.  Validity of Fuzzy Clustering Using Entropy Regularization , 2005, The 14th IEEE International Conference on Fuzzy Systems, 2005. FUZZ '05..

[76]  Vijay Vasudevan,et al.  Learning Transferable Architectures for Scalable Image Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[77]  Hichem Sahbi Explicit Context-Aware Kernel Map Learning for Image Annotation , 2013, ICVS.

[78]  Zhanxing Zhu,et al.  Reinforced Continual Learning , 2018, NeurIPS.

[79]  Weiming Dong,et al.  Incremental Concept Learning via Online Generative Memory Recall , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[80]  Shanxin Yuan,et al.  First-Person Hand Action Benchmark with RGB-D Videos and 3D Hand Pose Annotations , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[81]  Hichem Sahbi,et al.  Kernel methods and scale invariance using the triangular kernel , 2004 .

[82]  Hichem Sahbi,et al.  MLGCN: Multi-Laplacian Graph Convolutional Networks for Human Action Recognition , 2019, BMVC.

[83]  Hichem Sahbi,et al.  Bags-of-daglets for action recognition , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[84]  David Barber,et al.  Online Structured Laplace Approximations For Overcoming Catastrophic Forgetting , 2018, NeurIPS.

[85]  Aleksander Madry,et al.  How Does Batch Normalization Help Optimization? (No, It Is Not About Internal Covariate Shift) , 2018, NeurIPS.