Variational Auto-Regressive Gaussian Processes for Continual Learning

This paper proposes Variational Auto-Regressive Gaussian Process (VAR-GP), a principled Bayesian updating mechanism to incorporate new data for sequential tasks in the context of continual learning. It relies on a novel auto-regressive characterization of the variational distribution and inference is made scalable using sparse inducing point approximations. Experiments on standard continual learning benchmarks demonstrate the ability of VAR-GPs to perform well at new tasks without compromising performance on old ones, yielding competitive results to state-of-the-art methods. In addition, we qualitatively show how VAR-GP improves the predictive entropy estimates as we train on new tasks. Further, we conduct a thorough ablation study to verify the effectiveness of inferential choices.

[1]  David Barber,et al.  Online Structured Laplace Approximations For Overcoming Catastrophic Forgetting , 2018, NeurIPS.

[2]  Mark B. Ring Continual learning in reinforcement environments , 1995, GMD-Bericht.

[3]  Mauricio A. Álvarez,et al.  Continual Multi-task Gaussian Processes , 2019, ArXiv.

[4]  James Hensman,et al.  Scalable Variational Gaussian Process Classification , 2014, AISTATS.

[5]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[6]  Christoph H. Lampert,et al.  iCaRL: Incremental Classifier and Representation Learning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Jiwon Kim,et al.  Continual Learning with Deep Generative Replay , 2017, NIPS.

[8]  Andre Wibisono,et al.  Streaming Variational Bayes , 2013, NIPS.

[9]  Surya Ganguli,et al.  Continual Learning Through Synaptic Intelligence , 2017, ICML.

[10]  Sanjiv Kumar,et al.  Adaptive Methods for Nonconvex Optimization , 2018, NeurIPS.

[11]  Masa-aki Sato,et al.  Online Model Selection Based on the Variational Bayes , 2001, Neural Computation.

[12]  Yarin Gal,et al.  Towards Robust Evaluations of Continual Learning , 2018, ArXiv.

[13]  Sebastian Thrun,et al.  Lifelong Learning Algorithms , 1998, Learning to Learn.

[14]  Richard E. Turner,et al.  Streaming Sparse Gaussian Process Approximations , 2017, NIPS.

[15]  K. Laland,et al.  Social Learning: An Introduction to Mechanisms, Methods, and Models , 2013 .

[16]  Philip H. S. Torr,et al.  Riemannian Walk for Incremental Learning: Understanding Forgetting and Intransigence , 2018, ECCV.

[17]  Neil D. Lawrence,et al.  Gaussian Processes for Big Data , 2013, UAI.

[18]  L. Csató Gaussian processes:iterative sparse approximations , 2002 .

[19]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[20]  R Ratcliff,et al.  Connectionist models of recognition memory: constraints imposed by learning and forgetting functions. , 1990, Psychological review.

[21]  Sebastian Nowozin,et al.  How Good is the Bayes Posterior in Deep Neural Networks Really? , 2020, ICML.

[22]  Lehel Csató,et al.  Sparse On-Line Gaussian Processes , 2002, Neural Computation.

[23]  Yee Whye Teh,et al.  Functional Regularisation for Continual Learning using Gaussian Processes , 2019, ICLR.

[24]  Christopher K. I. Williams,et al.  Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning) , 2005 .

[25]  Carl E. Rasmussen,et al.  A Unifying View of Sparse Approximate Gaussian Process Regression , 2005, J. Mach. Learn. Res..

[26]  Marc'Aurelio Ranzato,et al.  Gradient Episodic Memory for Continual Learning , 2017, NIPS.

[27]  Richard E. Turner,et al.  Improving and Understanding Variational Continual Learning , 2019, ArXiv.

[28]  Derek Hoiem,et al.  Learning without Forgetting , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Andriy Mnih,et al.  Sparse Orthogonal Variational Inference for Gaussian Processes , 2020, AISTATS.

[30]  Yee Whye Teh,et al.  Progress & Compress: A scalable framework for continual learning , 2018, ICML.

[31]  Stefan Zohren,et al.  Hierarchical Indian buffet neural networks for Bayesian continual learning , 2019, UAI.

[32]  Razvan Pascanu,et al.  Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[33]  Michael McCloskey,et al.  Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem , 1989 .

[34]  Andreas S. Tolias,et al.  Three scenarios for continual learning , 2019, ArXiv.

[35]  Richard E. Turner,et al.  Variational Continual Learning , 2017, ICLR.

[36]  Chong Wang,et al.  Stochastic variational inference , 2012, J. Mach. Learn. Res..

[37]  Michalis K. Titsias,et al.  Variational Learning of Inducing Variables in Sparse Gaussian Processes , 2009, AISTATS.

[38]  Taesup Moon,et al.  Uncertainty-based Continual Learning with Adaptive Regularization , 2019, NeurIPS.

[39]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[40]  Andrew Gordon Wilson,et al.  Deep Kernel Learning , 2015, AISTATS.

[41]  Richard E. Turner,et al.  A Unifying Framework for Gaussian Process Pseudo-Point Approximations using Power Expectation Propagation , 2016, J. Mach. Learn. Res..