Pseudo-Rehearsal for Continual Learning with Normalizing Flows

Catastrophic forgetting (CF) happens whenever a neural network overwrites past knowledge while being trained on new tasks. Common techniques to handle CF include regularization of the weights (using, e.g., their importance on past tasks), and rehearsal strategies, where the network is constantly re-trained on past data. Generative models have also been applied for the latter, in order to have endless sources of data. In this paper, we propose a novel method that combines the strengths of regularization and generative-based rehearsal approaches. Our generative model consists of a normalizing flow (NF), a probabilistic and invertible neural network, trained on the internal embeddings of the network. By keeping a single NF conditioned on the task, we show that our memory overhead remains constant. In addition, exploiting the invertibility of the NF, we propose a simple approach to regularize the network's embeddings with respect to past tasks. We show that our method performs favorably with respect to state-of-the-art approaches in the literature, with bounded computational power and memory overheads.

[1]  David Filliat,et al.  Don't forget, there is more than forgetting: new metrics for Continual Learning , 2018, ArXiv.

[2]  Yoshua Bengio,et al.  NICE: Non-linear Independent Components Estimation , 2014, ICLR.

[3]  Trevor Darrell,et al.  Adversarial Continual Learning , 2020, ECCV.

[4]  Marc'Aurelio Ranzato,et al.  Gradient Episodic Memory for Continual Learning , 2017, NIPS.

[5]  Shakir Mohamed,et al.  Variational Inference with Normalizing Flows , 2015, ICML.

[6]  Qingyao Wu,et al.  Auto-Embedding Generative Adversarial Networks For High Resolution Image Synthesis , 2019, IEEE Transactions on Multimedia.

[7]  Prafulla Dhariwal,et al.  Glow: Generative Flow with Invertible 1x1 Convolutions , 2018, NeurIPS.

[8]  Byoung-Tak Zhang,et al.  Discriminative Variational Autoencoder for Continual Learning with Generative Replay , 2019 .

[9]  Max Welling,et al.  Learning Likelihoods with Conditional Normalizing Flows , 2019, ArXiv.

[10]  Samy Bengio,et al.  Density estimation using Real NVP , 2016, ICLR.

[11]  Marc'Aurelio Ranzato,et al.  Efficient Lifelong Learning with A-GEM , 2018, ICLR.

[12]  Michael McCloskey,et al.  Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem , 1989 .

[13]  Sebastian Thrun,et al.  Lifelong robot learning , 1993, Robotics Auton. Syst..

[14]  Andrew Y. Ng,et al.  Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[15]  E. Tabak,et al.  DENSITY ESTIMATION BY DUAL ASCENT OF THE LOG-LIKELIHOOD ∗ , 2010 .

[16]  Stefan Wermter,et al.  Continual Lifelong Learning with Neural Networks: A Review , 2019, Neural Networks.

[17]  Jiwon Kim,et al.  Continual Learning with Deep Generative Replay , 2017, NIPS.

[18]  Surya Ganguli,et al.  Continual Learning Through Synaptic Intelligence , 2017, ICML.

[19]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[20]  Venu Govindaraju,et al.  Why Regularized Auto-Encoders learn Sparse Representation? , 2015, ICML.

[21]  Alexandros Karatzoglou,et al.  Overcoming Catastrophic Forgetting with Hard Attention to the Task , 2018 .

[22]  Razvan Pascanu,et al.  Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[23]  Max Welling,et al.  Improved Variational Inference with Inverse Autoregressive Flow , 2016, NIPS 2016.

[24]  Elad Hoffer,et al.  Bayesian Gradient Descent: Online Variational Bayes Learning with Increased Robustness to Catastrophic Forgetting and Weight Pruning , 2018, ArXiv.

[25]  R. French Catastrophic forgetting in connectionist networks , 1999, Trends in Cognitive Sciences.

[26]  Derek Hoiem,et al.  Learning without Forgetting , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Elad Hoffer,et al.  Task Agnostic Continual Learning Using Online Variational Bayes , 2018, 1803.10123.

[28]  Michal Valko,et al.  Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning , 2020, NeurIPS.

[29]  Ivan Kobyzev,et al.  Normalizing Flows: An Introduction and Review of Current Methods , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Yee Whye Teh,et al.  Continual Unsupervised Representation Learning , 2019, NeurIPS.

[31]  E. Tabak,et al.  A Family of Nonparametric Density Estimation Algorithms , 2013 .

[32]  Davide Maltoni,et al.  Continuous Learning in Single-Incremental-Task Scenarios , 2018, Neural Networks.

[33]  Eric Nalisnick,et al.  Normalizing Flows for Probabilistic Modeling and Inference , 2019, J. Mach. Learn. Res..

[34]  Alex Lamb,et al.  Deep Learning for Classical Japanese Literature , 2018, ArXiv.

[35]  Tinne Tuytelaars,et al.  Task-Free Continual Learning , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Vincenzo Lomonaco,et al.  Efficient Continual Learning in Neural Networks with Embedding Regularization , 2019, Neurocomputing.

[37]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[38]  R Ratcliff,et al.  Connectionist models of recognition memory: constraints imposed by learning and forgetting functions. , 1990, Psychological review.

[39]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[40]  Anthony V. Robins,et al.  Catastrophic Forgetting, Rehearsal and Pseudorehearsal , 1995, Connect. Sci..