Functional Regularisation for Continual Learning

We introduce a framework for continual learning based on Bayesian inference over the function space rather than the parameters of a deep neural network. This method, referred to as functional regularisation for continual learning, avoids forgetting a previous task by constructing and memorising an approximate posterior belief over the underlying task-specific function. To achieve this we rely on a Gaussian process obtained by treating the weights of the last layer of a neural network as random and Gaussian distributed. Then, the training algorithm sequentially encounters tasks and constructs posterior beliefs over the task-specific functions by using inducing point sparse Gaussian process methods. At each step a new task is first learnt and then a summary is constructed consisting of (i) inducing inputs and (ii) a posterior distribution over the function values at these inputs. This summary then regularises learning of future tasks, through Kullback-Leibler regularisation terms, so that catastrophic forgetting of earlier tasks is avoided. We demonstrate our algorithm in classification datasets, such as Split-MNIST, Permuted-MNIST and Omniglot.

[1]  Alexis Boukouvalas,et al.  GPflow: A Gaussian Process Library using TensorFlow , 2016, J. Mach. Learn. Res..

[2]  Mark B. Ring Continual learning in reinforcement environments , 1995, GMD-Bericht.

[3]  Marc'Aurelio Ranzato,et al.  Gradient Episodic Memory for Continual Learning , 2017, NIPS.

[4]  Neil D. Lawrence,et al.  Kernels for Vector-Valued Functions: a Review , 2011, Found. Trends Mach. Learn..

[5]  Joshua B. Tenenbaum,et al.  One shot learning of simple visual concepts , 2011, CogSci.

[6]  Matthias W. Seeger,et al.  Using the Nyström Method to Speed Up Kernel Machines , 2000, NIPS.

[7]  Neil D. Lawrence,et al.  Gaussian Processes for Big Data , 2013, UAI.

[8]  Edwin V. Bonilla,et al.  Multi-task Gaussian Process Prediction , 2007, NIPS.

[9]  James Hensman,et al.  Scalable Variational Gaussian Process Classification , 2014, AISTATS.

[10]  Jürgen Schmidhuber,et al.  PowerPlay: Training an Increasingly General Problem Solver by Continually Searching for the Simplest Still Unsolvable Problem , 2011, Front. Psychol..

[11]  Razvan Pascanu,et al.  Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[12]  Richard E. Turner,et al.  Streaming Sparse Gaussian Process Approximations , 2017, NIPS.

[13]  Michalis K. Titsias,et al.  Variational Learning of Inducing Variables in Sparse Gaussian Processes , 2009, AISTATS.

[14]  Zoubin Ghahramani,et al.  Sparse Gaussian Processes using Pseudo-inputs , 2005, NIPS.

[15]  Christoph H. Lampert,et al.  iCaRL: Incremental Classifier and Representation Learning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Carl E. Rasmussen,et al.  A Unifying View of Sparse Approximate Gaussian Process Regression , 2005, J. Mach. Learn. Res..

[17]  Alexander J. Smola,et al.  Laplace Propagation , 2003, NIPS.

[18]  Neil D. Lawrence,et al.  Fast Forward Selection to Speed Up Sparse Gaussian Process Regression , 2003, AISTATS.

[19]  Roni Khardon,et al.  Sparse Variational Inference for Generalized GP Models , 2015, ICML.

[20]  Yoshua Bengio,et al.  An Empirical Investigation of Catastrophic Forgeting in Gradient-Based Neural Networks , 2013, ICLR.

[21]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[22]  Stephen J. Roberts,et al.  Variational Inference for Gaussian Process Modulated Poisson Processes , 2014, ICML.

[23]  Jasper Snoek,et al.  Deep Bayesian Bandits Showdown: An Empirical Comparison of Bayesian Deep Networks for Thompson Sampling , 2018, ICLR.

[24]  Derek Hoiem,et al.  Learning without Forgetting , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Neil D. Lawrence,et al.  Fast Sparse Gaussian Process Methods: The Informative Vector Machine , 2002, NIPS.

[26]  Andrew Gordon Wilson,et al.  Deep Kernel Learning , 2015, AISTATS.

[27]  Anthony V. Robins,et al.  Catastrophic Forgetting and the Pseudorehearsal Solution in Hopfield-type Networks , 1998, Connect. Sci..

[28]  Bartunov Sergey,et al.  Meta-Learning with Memory-Augmented Neural Networks , 2016 .

[29]  Richard E. Turner,et al.  Variational Continual Learning , 2017, ICLR.

[30]  M. Opper Sparse Online Gaussian Processes , 2008 .

[31]  Anthony V. Robins,et al.  Catastrophic Forgetting, Rehearsal and Pseudorehearsal , 1995, Connect. Sci..

[32]  Christopher K. I. Williams,et al.  Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning) , 2005 .

[33]  Edwin V. Bonilla,et al.  Scalable Inference for Gaussian Process Models with Black-Box Likelihoods , 2015, NIPS.

[34]  Yarin Gal,et al.  Towards Robust Evaluations of Continual Learning , 2018, ArXiv.

[35]  Richard E. Turner,et al.  A Unifying Framework for Gaussian Process Pseudo-Point Approximations using Power Expectation Propagation , 2016, J. Mach. Learn. Res..

[36]  Yee Whye Teh,et al.  Progress & Compress: A scalable framework for continual learning , 2018, ICML.