论文信息 - Functional Regularisation for Continual Learning using Gaussian Processes - 字舞流文

Functional Regularisation for Continual Learning using Gaussian Processes

We introduce a framework for Continual Learning (CL) based on Bayesian inference over the function space rather than the parameters of a deep neural network. This method, referred to as functional regularisation for Continual Learning, avoids forgetting a previous task by constructing and memorising an approximate posterior belief over the underlying task-specific function. To achieve this we rely on a Gaussian process obtained by treating the weights of the last layer of a neural network as random and Gaussian distributed. Then, the training algorithm sequentially encounters tasks and constructs posterior beliefs over the task-specific functions by using inducing point sparse Gaussian process methods. At each step a new task is first learnt and then a summary is constructed consisting of (i) inducing inputs -- a fixed-size subset of the task inputs selected such that it optimally represents the task -- and (ii) a posterior distribution over the function values at these inputs. This summary then regularises learning of future tasks, through Kullback-Leibler regularisation terms. Our method thus unites approaches focused on (pseudo-)rehearsal with those derived from a sequential Bayesian inference perspective in a principled way, leading to strong results on accepted benchmarks.

Yee Whye Teh | Razvan Pascanu | Jonathan Schwarz | Michalis K. Titsias | Alexander G. de G. Matthews | Y. Teh | Razvan Pascanu | Jonathan Schwarz | M. Titsias | A. G. Matthews

[1] Marc'Aurelio Ranzato,et al. Gradient Episodic Memory for Continual Learning , 2017, NIPS.

[2] Mark B. Ring. Continual learning in reinforcement environments , 1995, GMD-Bericht.

[3] Christopher K. I. Williams,et al. Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning) , 2005 .

[4] Pieter Abbeel,et al. Continuous Adaptation via Meta-Learning in Nonstationary and Competitive Environments , 2017, ICLR.

[5] David Rolnick,et al. Experience Replay for Continual Learning , 2018, NeurIPS.

[6] Zoubin Ghahramani,et al. Sparse Gaussian Processes using Pseudo-inputs , 2005, NIPS.

[7] Matthias W. Seeger,et al. Using the Nyström Method to Speed Up Kernel Machines , 2000, NIPS.

[8] Andreas Krause,et al. Coresets for Nonparametric Estimation - the Case of DP-Means , 2015, ICML.

[9] Christoph H. Lampert,et al. iCaRL: Incremental Classifier and Representation Learning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10] Derek Hoiem,et al. Learning without Forgetting , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11] Richard E. Turner,et al. A Unifying Framework for Gaussian Process Pseudo-Point Approximations using Power Expectation Propagation , 2016, J. Mach. Learn. Res..

[12] Lehel Csató,et al. Sparse On-Line Gaussian Processes , 2002, Neural Computation.

[13] Roni Khardon,et al. Sparse Variational Inference for Generalized GP Models , 2015, ICML.

[14] Yoshua Bengio,et al. An Empirical Investigation of Catastrophic Forgeting in Gradient-Based Neural Networks , 2013, ICLR.

[15] Pierre Baldi,et al. Bayesian surprise attracts human attention , 2005, Vision Research.

[16] Murray Shanahan,et al. Continual Reinforcement Learning with Complex Synapses , 2018, ICML.

[17] Anthony V. Robins,et al. Catastrophic Forgetting and the Pseudorehearsal Solution in Hopfield-type Networks , 1998, Connect. Sci..

[18] Richard E. Turner,et al. Variational Continual Learning , 2017, ICLR.

[19] Yee Whye Teh,et al. Progress & Compress: A scalable framework for continual learning , 2018, ICML.

[20] Daan Wierstra,et al. Meta-Learning with Memory-Augmented Neural Networks , 2016, ICML.

[21] Jiwon Kim,et al. Continual Learning with Deep Generative Replay , 2017, NIPS.

[22] Neil D. Lawrence,et al. Kernels for Vector-Valued Functions: a Review , 2011, Found. Trends Mach. Learn..

[23] Neil D. Lawrence,et al. Gaussian Processes for Big Data , 2013, UAI.

[24] Sergey Levine,et al. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[25] Oriol Vinyals,et al. Matching Networks for One Shot Learning , 2016, NIPS.

[26] Neil D. Lawrence,et al. Fast Sparse Gaussian Process Methods: The Informative Vector Machine , 2002, NIPS.

[27] Andrew Gordon Wilson,et al. Deep Kernel Learning , 2015, AISTATS.

[28] Carl E. Rasmussen,et al. A Unifying View of Sparse Approximate Gaussian Process Regression , 2005, J. Mach. Learn. Res..

[29] Alexander J. Smola,et al. Laplace Propagation , 2003, NIPS.

[30] Alexis Boukouvalas,et al. GPflow: A Gaussian Process Library using TensorFlow , 2016, J. Mach. Learn. Res..

[31] Geoffrey E. Hinton,et al. Visualizing Data using t-SNE , 2008 .

[32] Neil D. Lawrence,et al. Fast Forward Selection to Speed Up Sparse Gaussian Process Regression , 2003, AISTATS.

[33] Edwin V. Bonilla,et al. Scalable Inference for Gaussian Process Models with Black-Box Likelihoods , 2015, NIPS.

[34] Yarin Gal,et al. Towards Robust Evaluations of Continual Learning , 2018, ArXiv.

[35] Joshua B. Tenenbaum,et al. One shot learning of simple visual concepts , 2011, CogSci.

[36] David Isele,et al. Selective Experience Replay for Lifelong Learning , 2018, AAAI.

[37] Jürgen Schmidhuber,et al. PowerPlay: Training an Increasingly General Problem Solver by Continually Searching for the Simplest Still Unsolvable Problem , 2011, Front. Psychol..

[38] Michalis K. Titsias,et al. Variational Learning of Inducing Variables in Sparse Gaussian Processes , 2009, AISTATS.

[39] Edwin V. Bonilla,et al. Multi-task Gaussian Process Prediction , 2007, NIPS.

[40] Surya Ganguli,et al. Continual Learning Through Synaptic Intelligence , 2017, ICML.

[41] Stephen J. Roberts,et al. Variational Inference for Gaussian Process Modulated Poisson Processes , 2014, ICML.

[42] Jasper Snoek,et al. Deep Bayesian Bandits Showdown: An Empirical Comparison of Bayesian Deep Networks for Thompson Sampling , 2018, ICLR.

[43] Ferenc Huszár,et al. Note on the quadratic penalties in elastic weight consolidation , 2017, Proceedings of the National Academy of Sciences.

[44] Razvan Pascanu,et al. Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[45] L. Csató. Gaussian processes:iterative sparse approximations , 2002 .

[46] M. Opper. Sparse Online Gaussian Processes , 2008 .

[47] Anthony V. Robins,et al. Catastrophic Forgetting, Rehearsal and Pseudorehearsal , 1995, Connect. Sci..

[48] James Hensman,et al. Scalable Variational Gaussian Process Classification , 2014, AISTATS.

[49] Richard E. Turner,et al. Streaming Sparse Gaussian Process Approximations , 2017, NIPS.