On Sequential Bayesian Inference for Continual Learning

Sequential Bayesian inference can be used for continual learning to prevent catastrophic forgetting of past tasks and provide an informative prior when learning new tasks. We revisit sequential Bayesian inference and assess whether using the previous task’s posterior as a prior for a new task can prevent catastrophic forgetting in Bayesian neural networks. Our first contribution is to perform sequential Bayesian inference using Hamiltonian Monte Carlo. We propagate the posterior as a prior for new tasks by approximating the posterior via fitting a density estimator on Hamiltonian Monte Carlo samples. We find that this approach fails to prevent catastrophic forgetting, demonstrating the difficulty in performing sequential Bayesian inference in neural networks. From there, we study simple analytical examples of sequential Bayesian inference and CL and highlight the issue of model misspecification, which can lead to sub-optimal continual learning performance despite exact inference. Furthermore, we discuss how task data imbalances can cause forgetting. From these limitations, we argue that we need probabilistic models of the continual learning generative process rather than relying on sequential Bayesian inference over Bayesian neural network weights. Our final contribution is to propose a simple baseline called Prototypical Bayesian Continual Learning, which is competitive with the best performing Bayesian continual learning methods on class incremental continual learning computer vision benchmarks.

[1]  T. Tuytelaars,et al.  Three types of incremental learning , 2022, Nat. Mac. Intell..

[2]  Andrew Gordon Wilson,et al.  What Are Bayesian Neural Network Posteriors Really Like? , 2021, ICML.

[3]  A. Tolias,et al.  Class-Incremental Learning with Generative Classifiers , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[4]  Benjamin F. Grewe,et al.  Posterior Meta-Replay for Continual Learning , 2021, NeurIPS.

[5]  Richard E. Turner,et al.  Generalized Variational Continual Learning , 2020, ICLR.

[6]  Adam D. Cobb,et al.  Scaling Hamiltonian Monte Carlo Inference for Bayesian Neural Networks with Symmetric Splitting , 2020, UAI.

[7]  Thang D. Bui,et al.  Variational Auto-Regressive Gaussian Processes for Continual Learning , 2020, ICML.

[8]  Tom Diethe,et al.  Optimal Continual Learning has Perfect Memory and is NP-hard , 2020, ICML.

[9]  Mohammad Emtiyaz Khan,et al.  Continual Deep Learning by Functional Regularisation of Memorable Past , 2020, NeurIPS.

[10]  俊一 甘利 5分で分かる!? 有名論文ナナメ読み:Jacot, Arthor, Gabriel, Franck and Hongler, Clement : Neural Tangent Kernel : Convergence and Generalization in Neural Networks , 2020 .

[11]  Simone Calderara,et al.  Dark Experience for General Continual Learning: a Strong, Simple Baseline , 2020, NeurIPS.

[12]  Pavel Izmailov,et al.  Bayesian Deep Learning and a Probabilistic Perspective of Generalization , 2020, NeurIPS.

[13]  Piyush Rai,et al.  Bayesian Structural Adaptation for Continual Learning , 2019, ICML.

[14]  Stefan Zohren,et al.  Hierarchical Indian buffet neural networks for Bayesian continual learning , 2019, UAI.

[15]  Richard E. Turner,et al.  Continual Learning with Adaptive Weights (CLAW) , 2019, ICLR.

[16]  Marco Pavone,et al.  Continuous Meta-Learning without Tasks , 2019, NeurIPS.

[17]  Tinne Tuytelaars,et al.  A Continual Learning Survey: Defying Forgetting in Classification Tasks , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Michael A. Osborne,et al.  Radial Bayesian Neural Networks: Beyond Discrete Support In Large-Scale Bayesian Deep Learning , 2019, AISTATS.

[19]  Trevor Darrell,et al.  Uncertainty-guided Continual Learning with Bayesian Neural Networks , 2019, ICLR.

[20]  Taesup Moon,et al.  Uncertainty-based Continual Learning with Adaptive Regularization , 2019, NeurIPS.

[21]  Richard E. Turner,et al.  Improving and Understanding Variational Continual Learning , 2019, ArXiv.

[22]  Andreas S. Tolias,et al.  Three scenarios for continual learning , 2019, ArXiv.

[23]  Yoshua Bengio,et al.  Gradient based sample selection for online continual learning , 2019, NeurIPS.

[24]  Yee Whye Teh,et al.  Functional Regularisation for Continual Learning using Gaussian Processes , 2019, ICLR.

[25]  Tinne Tuytelaars,et al.  Task-Free Continual Learning , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Yen-Cheng Liu,et al.  Re-evaluating Continual Learning Scenarios: A Categorization and Case for Strong Baselines , 2018, ArXiv.

[27]  Alexandros Kalousis,et al.  Continual Classification Learning Using Generative Models , 2018, NIPS 2018.

[28]  Laurence Aitchison,et al.  Bayesian filtering unifies adaptive and non-adaptive neural network optimization methods , 2018, NeurIPS.

[29]  David Rolnick,et al.  Measuring and regularizing networks in function space , 2018, ICLR.

[30]  Yee Whye Teh,et al.  Progress & Compress: A scalable framework for continual learning , 2018, ICML.

[31]  David Barber,et al.  Online Structured Laplace Approximations For Overcoming Catastrophic Forgetting , 2018, NeurIPS.

[32]  Daniel Soudry,et al.  Task-Agnostic Continual Learning Using Online Variational Bayes With Fixed-Point Updates , 2020, Neural Computation.

[33]  Richard E. Turner,et al.  Variational Continual Learning , 2017, ICLR.

[34]  Richard S. Zemel,et al.  Prototypical Networks for Few-shot Learning , 2017, NIPS.

[35]  Surya Ganguli,et al.  Continual Learning Through Synaptic Intelligence , 2017, ICML.

[36]  Andrei A. Rusu,et al.  Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[37]  Christoph H. Lampert,et al.  iCaRL: Incremental Classifier and Representation Learning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Samy Bengio,et al.  Density estimation using Real NVP , 2016, ICLR.

[39]  Julien Cornebise,et al.  Weight Uncertainty in Neural Network , 2015, ICML.

[40]  Alex Graves,et al.  Practical Variational Inference for Neural Networks , 2011, NIPS.

[41]  Radford M. Neal MCMC Using Hamiltonian Dynamics , 2011, 1206.1901.

[42]  Freda Kemp,et al.  An Introduction to Sequential Monte Carlo Methods , 2003 .

[43]  R. French Catastrophic forgetting in connectionist networks , 1999, Trends in Cognitive Sciences.

[44]  Sebastian Thrun,et al.  Lifelong robot learning , 1993, Robotics Auton. Syst..

[45]  David J. C. MacKay,et al.  A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.

[46]  R. E. Kalman,et al.  A New Approach to Linear Filtering and Prediction Problems , 2002 .

[47]  Tim G. J. Rudner,et al.  Continual Learning via Sequential Function-Space Variational Inference , 2023, ICML.

[48]  Kevin J Liang,et al.  Continual Learning using a Bayesian Nonparametric Dictionary of Weight Factors , 2021, AISTATS.

[49]  Nicolas Chopin,et al.  An Introduction to Sequential Monte Carlo , 2020 .

[50]  Kaare Brandt Petersen,et al.  The Matrix Cookbook , 2006 .

[51]  E. Türkcan,et al.  ADAPTIVE TRAINING OF FEEDFORWARD NEURAL NETWORKS BY KALMAN FILTERING ADAPTIVE TRAINING OF FEEDFORWARD NEURAL NETWORKS BY KALMAN FILTERING , 2004 .

[52]  Nando de Freitas,et al.  An Introduction to Sequential Monte Carlo Methods , 2001, Sequential Monte Carlo Methods in Practice.

[53]  Nando de Freitas,et al.  Sequential Monte Carlo Methods in Practice , 2001, Statistics for Engineering and Information Science.

[54]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[55]  Michael McCloskey,et al.  Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem , 1989 .