Task-agnostic Continual Learning with Hybrid Probabilistic Models

Learning new tasks continuously without forgetting on a constantly changing data distribution is essential for real-world problems but extremely challenging for modern deep learning. In this work we propose HCL, a Hybrid generativediscriminative approach to Continual Learning for classification. We model the distribution of each task and each class with a normalizing flow. The flow is used to learn the data distribution, perform classification, identify task changes, and avoid forgetting, all leveraging the invertibility and exact likelihood which are uniquely enabled by the normalizing flow model. We use the generative capabilities of the flow to avoid catastrophic forgetting through generative replay and a novel functional regularization technique. For task identification, we use state-of-the-art anomaly detection techniques based on measuring the typicality of the model’s statistics. We demonstrate the strong performance of HCL on a range of continual learning benchmarks such as split-MNIST, split-CIFAR, and SVHN-MNIST.

[1]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[2]  Prafulla Dhariwal,et al.  Glow: Generative Flow with Invertible 1x1 Convolutions , 2018, NeurIPS.

[3]  Marc'Aurelio Ranzato,et al.  Gradient Episodic Memory for Continual Learning , 2017, NIPS.

[4]  Yee Whye Teh,et al.  Do Deep Generative Models Know What They Don't Know? , 2018, ICLR.

[5]  Yee Whye Teh,et al.  Functional Regularisation for Continual Learning using Gaussian Processes , 2019, ICLR.

[6]  Kaushik Roy,et al.  Gradient Projection Memory for Continual Learning , 2021, ICLR.

[7]  Alex Graves,et al.  Conditional Image Generation with PixelCNN Decoders , 2016, NIPS.

[8]  Michael McCloskey,et al.  Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem , 1989 .

[9]  Andrew Y. Ng,et al.  Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[10]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[11]  Andreas S. Tolias,et al.  Three scenarios for continual learning , 2019, ArXiv.

[12]  Dmitry Vetrov,et al.  Semi-Conditional Normalizing Flows for Semi-Supervised Learning , 2019, ArXiv.

[13]  Seyed Iman Mirzadeh,et al.  Linear Mode Connectivity in Multitask and Continual Learning , 2020, ICLR.

[14]  Quoc V. Le,et al.  EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.

[15]  Ev Zisselman,et al.  Deep Residual Flow for Out of Distribution Detection , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  David Duvenaud,et al.  FFJORD: Free-form Continuous Dynamics for Scalable Reversible Generative Models , 2018, ICLR.

[17]  Yee Whye Teh,et al.  Continual Unsupervised Representation Learning , 2019, NeurIPS.

[18]  David Duvenaud,et al.  Invertible Residual Networks , 2018, ICML.

[19]  Elad Hoffer,et al.  Task Agnostic Continual Learning Using Online Variational Bayes , 2018, 1803.10123.

[20]  Andrew Gordon Wilson,et al.  Semi-Supervised Learning with Normalizing Flows , 2019, ICML.

[21]  Mohammad Emtiyaz Khan,et al.  Continual Deep Learning by Functional Regularisation of Memorable Past , 2020, NeurIPS.

[22]  R. French Catastrophic forgetting in connectionist networks , 1999, Trends in Cognitive Sciences.

[23]  Hava T. Siegelmann,et al.  Brain-inspired replay for continual learning with artificial neural networks , 2020, Nature Communications.

[24]  Adrian G. Bors,et al.  Learning latent representations across multiple data domains using Lifelong VAEGAN , 2020, ECCV.

[25]  Eric T. Nalisnick,et al.  Detecting Out-of-Distribution Inputs to Deep Generative Models Using Typicality , 2019 .

[26]  David Duvenaud,et al.  Residual Flows for Invertible Generative Modeling , 2019, NeurIPS.

[27]  Christoph H. Lampert,et al.  iCaRL: Incremental Classifier and Representation Learning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Tinne Tuytelaars,et al.  Task-Free Continual Learning , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Razvan Pascanu,et al.  Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[30]  Yen-Cheng Liu,et al.  Re-evaluating Continual Learning Scenarios: A Categorization and Case for Strong Baselines , 2018, ArXiv.

[31]  Jiwon Kim,et al.  Continual Learning with Deep Generative Replay , 2017, NIPS.

[32]  Thomas L. Griffiths,et al.  Reconciling meta-learning and continual learning with online mixtures of tasks , 2018, NeurIPS.

[33]  Simone Scardapane,et al.  Pseudo-Rehearsal for Continual Learning with Normalizing Flows , 2020, ArXiv.

[34]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[35]  Simone Calderara,et al.  Dark Experience for General Continual Learning: a Strong, Simple Baseline , 2020, NeurIPS.

[36]  Tom Eccles,et al.  Life-Long Disentangled Representation Learning with Cross-Domain Latent Homologies , 2018, NeurIPS.

[37]  Ang Li,et al.  Hybrid Models for Open Set Recognition , 2020, ECCV.

[38]  Mehrdad Farajtabar,et al.  SOLA: Continual Learning with Second-Order Loss Approximation , 2020, ArXiv.

[39]  Sung Ju Hwang,et al.  Lifelong Learning with Dynamically Expandable Networks , 2017, ICLR.

[40]  Gerald Tesauro,et al.  Learning to Learn without Forgetting By Maximizing Transfer and Minimizing Interference , 2018, ICLR.

[41]  Samy Bengio,et al.  Density estimation using Real NVP , 2016, ICLR.

[42]  Albert Gordo,et al.  Using Hindsight to Anchor Past Knowledge in Continual Learning , 2019, AAAI.

[43]  Matthias De Lange,et al.  Continual learning: A comparative study on how to defy forgetting in classification tasks , 2019, ArXiv.

[44]  Yee Whye Teh,et al.  Task Agnostic Continual Learning via Meta Learning , 2019, ArXiv.

[45]  Yee Whye Teh,et al.  Hybrid Models with Deep and Invertible Features , 2019, ICML.

[46]  Alexander A. Alemi,et al.  Density of States Estimation for Out-of-Distribution Detection , 2020, ArXiv.

[47]  Mehrdad Farajtabar,et al.  The Effectiveness of Memory Replay in Large Scale Continual Learning , 2020, ArXiv.

[48]  Wesley J. Maddox,et al.  Invertible Convolutional Networks , 2019 .

[49]  Pramod K. Varshney,et al.  Anomalous Example Detection in Deep Learning: A Survey , 2020, IEEE Access.

[50]  Max Welling,et al.  Semi-supervised Learning with Deep Generative Models , 2014, NIPS.

[51]  Eric Nalisnick,et al.  Normalizing Flows for Probabilistic Modeling and Inference , 2019, J. Mach. Learn. Res..

[52]  Seyed Iman Mirzadeh,et al.  Understanding the Role of Training Regimes in Continual Learning , 2020, NeurIPS.

[53]  Razvan Pascanu,et al.  Revisiting Natural Gradient for Deep Networks , 2013, ICLR.

[54]  Stefan Wermter,et al.  Continual Lifelong Learning with Neural Networks: A Review , 2019, Neural Networks.

[55]  Hassan Ghasemzadeh,et al.  Dropout as an Implicit Gating Mechanism For Continual Learning , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[56]  Visvanathan Ramesh,et al.  Unified Probabilistic Deep Continual Learning through Generative Replay and Open Set Recognition , 2019, J. Imaging.

[57]  Vincenzo Lomonaco,et al.  Efficient Continual Learning in Neural Networks with Embedding Regularization , 2019, Neurocomputing.

[58]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[59]  Laurent Itti,et al.  Closed-Loop GAN for continual Learning , 2018, IJCAI.

[60]  Richard Socher,et al.  Learn to Grow: A Continual Structure Learning Framework for Overcoming Catastrophic Forgetting , 2019, ICML.

[61]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[62]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[63]  Junsoo Ha,et al.  A Neural Dirichlet Process Mixture Model for Task-Free Continual Learning , 2020, ICLR.

[64]  Mehrdad Farajtabar,et al.  Orthogonal Gradient Descent for Continual Learning , 2019, AISTATS.

[65]  Andrew Gordon Wilson,et al.  Why Normalizing Flows Fail to Detect Out-of-Distribution Data , 2020, NeurIPS.

[66]  Andrei A. Rusu,et al.  Embracing Change: Continual Learning in Deep Neural Networks , 2020, Trends in Cognitive Sciences.

[67]  Marc'Aurelio Ranzato,et al.  Efficient Lifelong Learning with A-GEM , 2018, ICLR.