Functional Regularisation for Continual Learning using Gaussian Processes

We introduce a framework for Continual Learning (CL) based on Bayesian inference over the function space rather than the parameters of a deep neural network. This method, referred to as functional regularisation for Continual Learning, avoids forgetting a previous task by constructing and memorising an approximate posterior belief over the underlying task-specific function. To achieve this we rely on a Gaussian process obtained by treating the weights of the last layer of a neural network as random and Gaussian distributed. Then, the training algorithm sequentially encounters tasks and constructs posterior beliefs over the task-specific functions by using inducing point sparse Gaussian process methods. At each step a new task is first learnt and then a summary is constructed consisting of (i) inducing inputs -- a fixed-size subset of the task inputs selected such that it optimally represents the task -- and (ii) a posterior distribution over the function values at these inputs. This summary then regularises learning of future tasks, through Kullback-Leibler regularisation terms. Our method thus unites approaches focused on (pseudo-)rehearsal with those derived from a sequential Bayesian inference perspective in a principled way, leading to strong results on accepted benchmarks.

[1]  Marc'Aurelio Ranzato,et al.  Gradient Episodic Memory for Continual Learning , 2017, NIPS.

[2]  Mark B. Ring Continual learning in reinforcement environments , 1995, GMD-Bericht.

[3]  Christopher K. I. Williams,et al.  Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning) , 2005 .

[4]  Pieter Abbeel,et al.  Continuous Adaptation via Meta-Learning in Nonstationary and Competitive Environments , 2017, ICLR.

[5]  David Rolnick,et al.  Experience Replay for Continual Learning , 2018, NeurIPS.

[6]  Zoubin Ghahramani,et al.  Sparse Gaussian Processes using Pseudo-inputs , 2005, NIPS.

[7]  Matthias W. Seeger,et al.  Using the Nyström Method to Speed Up Kernel Machines , 2000, NIPS.

[8]  Andreas Krause,et al.  Coresets for Nonparametric Estimation - the Case of DP-Means , 2015, ICML.

[9]  Christoph H. Lampert,et al.  iCaRL: Incremental Classifier and Representation Learning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Derek Hoiem,et al.  Learning without Forgetting , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Richard E. Turner,et al.  A Unifying Framework for Gaussian Process Pseudo-Point Approximations using Power Expectation Propagation , 2016, J. Mach. Learn. Res..

[12]  Lehel Csató,et al.  Sparse On-Line Gaussian Processes , 2002, Neural Computation.

[13]  Roni Khardon,et al.  Sparse Variational Inference for Generalized GP Models , 2015, ICML.

[14]  Yoshua Bengio,et al.  An Empirical Investigation of Catastrophic Forgeting in Gradient-Based Neural Networks , 2013, ICLR.

[15]  Pierre Baldi,et al.  Bayesian surprise attracts human attention , 2005, Vision Research.

[16]  Murray Shanahan,et al.  Continual Reinforcement Learning with Complex Synapses , 2018, ICML.

[17]  Anthony V. Robins,et al.  Catastrophic Forgetting and the Pseudorehearsal Solution in Hopfield-type Networks , 1998, Connect. Sci..

[18]  Richard E. Turner,et al.  Variational Continual Learning , 2017, ICLR.

[19]  Yee Whye Teh,et al.  Progress & Compress: A scalable framework for continual learning , 2018, ICML.

[20]  Daan Wierstra,et al.  Meta-Learning with Memory-Augmented Neural Networks , 2016, ICML.

[21]  Jiwon Kim,et al.  Continual Learning with Deep Generative Replay , 2017, NIPS.

[22]  Neil D. Lawrence,et al.  Kernels for Vector-Valued Functions: a Review , 2011, Found. Trends Mach. Learn..

[23]  Neil D. Lawrence,et al.  Gaussian Processes for Big Data , 2013, UAI.

[24]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[25]  Oriol Vinyals,et al.  Matching Networks for One Shot Learning , 2016, NIPS.

[26]  Neil D. Lawrence,et al.  Fast Sparse Gaussian Process Methods: The Informative Vector Machine , 2002, NIPS.

[27]  Andrew Gordon Wilson,et al.  Deep Kernel Learning , 2015, AISTATS.

[28]  Carl E. Rasmussen,et al.  A Unifying View of Sparse Approximate Gaussian Process Regression , 2005, J. Mach. Learn. Res..

[29]  Alexander J. Smola,et al.  Laplace Propagation , 2003, NIPS.

[30]  Alexis Boukouvalas,et al.  GPflow: A Gaussian Process Library using TensorFlow , 2016, J. Mach. Learn. Res..

[31]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[32]  Neil D. Lawrence,et al.  Fast Forward Selection to Speed Up Sparse Gaussian Process Regression , 2003, AISTATS.

[33]  Edwin V. Bonilla,et al.  Scalable Inference for Gaussian Process Models with Black-Box Likelihoods , 2015, NIPS.

[34]  Yarin Gal,et al.  Towards Robust Evaluations of Continual Learning , 2018, ArXiv.

[35]  Joshua B. Tenenbaum,et al.  One shot learning of simple visual concepts , 2011, CogSci.

[36]  David Isele,et al.  Selective Experience Replay for Lifelong Learning , 2018, AAAI.

[37]  Jürgen Schmidhuber,et al.  PowerPlay: Training an Increasingly General Problem Solver by Continually Searching for the Simplest Still Unsolvable Problem , 2011, Front. Psychol..

[38]  Michalis K. Titsias,et al.  Variational Learning of Inducing Variables in Sparse Gaussian Processes , 2009, AISTATS.

[39]  Edwin V. Bonilla,et al.  Multi-task Gaussian Process Prediction , 2007, NIPS.

[40]  Surya Ganguli,et al.  Continual Learning Through Synaptic Intelligence , 2017, ICML.

[41]  Stephen J. Roberts,et al.  Variational Inference for Gaussian Process Modulated Poisson Processes , 2014, ICML.

[42]  Jasper Snoek,et al.  Deep Bayesian Bandits Showdown: An Empirical Comparison of Bayesian Deep Networks for Thompson Sampling , 2018, ICLR.

[43]  Ferenc Huszár,et al.  Note on the quadratic penalties in elastic weight consolidation , 2017, Proceedings of the National Academy of Sciences.

[44]  Razvan Pascanu,et al.  Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[45]  L. Csató Gaussian processes:iterative sparse approximations , 2002 .

[46]  M. Opper Sparse Online Gaussian Processes , 2008 .

[47]  Anthony V. Robins,et al.  Catastrophic Forgetting, Rehearsal and Pseudorehearsal , 1995, Connect. Sci..

[48]  James Hensman,et al.  Scalable Variational Gaussian Process Classification , 2014, AISTATS.

[49]  Richard E. Turner,et al.  Streaming Sparse Gaussian Process Approximations , 2017, NIPS.