论文信息 - Variational Auto-Regressive Gaussian Processes for Continual Learning

Variational Auto-Regressive Gaussian Processes for Continual Learning

This paper proposes Variational Auto-Regressive Gaussian Process (VAR-GP), a principled Bayesian updating mechanism to incorporate new data for sequential tasks in the context of continual learning. It relies on a novel auto-regressive characterization of the variational distribution and inference is made scalable using sparse inducing point approximations. Experiments on standard continual learning benchmarks demonstrate the ability of VAR-GPs to perform well at new tasks without compromising performance on old ones, yielding competitive results to state-of-the-art methods. In addition, we qualitatively show how VAR-GP improves the predictive entropy estimates as we train on new tasks. Further, we conduct a thorough ablation study to verify the effectiveness of inferential choices.

Thang D. Bui | Theofanis Karaletsos | Sanyam Kapoor

[1] David Barber,et al. Online Structured Laplace Approximations For Overcoming Catastrophic Forgetting , 2018, NeurIPS.

[2] Mark B. Ring. Continual learning in reinforcement environments , 1995, GMD-Bericht.

[3] Mauricio A. Álvarez,et al. Continual Multi-task Gaussian Processes , 2019, ArXiv.

[4] James Hensman,et al. Scalable Variational Gaussian Process Classification , 2014, AISTATS.

[5] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[6] Christoph H. Lampert,et al. iCaRL: Incremental Classifier and Representation Learning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7] Jiwon Kim,et al. Continual Learning with Deep Generative Replay , 2017, NIPS.

[8] Andre Wibisono,et al. Streaming Variational Bayes , 2013, NIPS.

[9] Surya Ganguli,et al. Continual Learning Through Synaptic Intelligence , 2017, ICML.

[10] Sanjiv Kumar,et al. Adaptive Methods for Nonconvex Optimization , 2018, NeurIPS.

[11] Masa-aki Sato,et al. Online Model Selection Based on the Variational Bayes , 2001, Neural Computation.

[12] Yarin Gal,et al. Towards Robust Evaluations of Continual Learning , 2018, ArXiv.

[13] Sebastian Thrun,et al. Lifelong Learning Algorithms , 1998, Learning to Learn.

[14] Richard E. Turner,et al. Streaming Sparse Gaussian Process Approximations , 2017, NIPS.

[15] K. Laland,et al. Social Learning: An Introduction to Mechanisms, Methods, and Models , 2013 .

[16] Philip H. S. Torr,et al. Riemannian Walk for Incremental Learning: Understanding Forgetting and Intransigence , 2018, ECCV.

[17] Neil D. Lawrence,et al. Gaussian Processes for Big Data , 2013, UAI.

[18] L. Csató. Gaussian processes:iterative sparse approximations , 2002 .

[19] Kevin P. Murphy,et al. Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[20] R Ratcliff,et al. Connectionist models of recognition memory: constraints imposed by learning and forgetting functions. , 1990, Psychological review.

[21] Sebastian Nowozin,et al. How Good is the Bayes Posterior in Deep Neural Networks Really? , 2020, ICML.

[22] Lehel Csató,et al. Sparse On-Line Gaussian Processes , 2002, Neural Computation.

[23] Yee Whye Teh,et al. Functional Regularisation for Continual Learning using Gaussian Processes , 2019, ICLR.

[24] Christopher K. I. Williams,et al. Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning) , 2005 .

[25] Carl E. Rasmussen,et al. A Unifying View of Sparse Approximate Gaussian Process Regression , 2005, J. Mach. Learn. Res..

[26] Marc'Aurelio Ranzato,et al. Gradient Episodic Memory for Continual Learning , 2017, NIPS.

[27] Richard E. Turner,et al. Improving and Understanding Variational Continual Learning , 2019, ArXiv.

[28] Derek Hoiem,et al. Learning without Forgetting , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29] Andriy Mnih,et al. Sparse Orthogonal Variational Inference for Gaussian Processes , 2020, AISTATS.

[30] Yee Whye Teh,et al. Progress & Compress: A scalable framework for continual learning , 2018, ICML.

[31] Stefan Zohren,et al. Hierarchical Indian buffet neural networks for Bayesian continual learning , 2019, UAI.

[32] Razvan Pascanu,et al. Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[33] Michael McCloskey,et al. Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem , 1989 .

[34] Andreas S. Tolias,et al. Three scenarios for continual learning , 2019, ArXiv.

[35] Richard E. Turner,et al. Variational Continual Learning , 2017, ICLR.

[36] Chong Wang,et al. Stochastic variational inference , 2012, J. Mach. Learn. Res..

[37] Michalis K. Titsias,et al. Variational Learning of Inducing Variables in Sparse Gaussian Processes , 2009, AISTATS.

[38] Taesup Moon,et al. Uncertainty-based Continual Learning with Adaptive Regularization , 2019, NeurIPS.

[39] Michael I. Jordan,et al. An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[40] Andrew Gordon Wilson,et al. Deep Kernel Learning , 2015, AISTATS.

[41] Richard E. Turner,et al. A Unifying Framework for Gaussian Process Pseudo-Point Approximations using Power Expectation Propagation , 2016, J. Mach. Learn. Res..