论文信息 - Functional Regularisation for Continual Learning - 字舞流文

Functional Regularisation for Continual Learning

We introduce a framework for continual learning based on Bayesian inference over the function space rather than the parameters of a deep neural network. This method, referred to as functional regularisation for continual learning, avoids forgetting a previous task by constructing and memorising an approximate posterior belief over the underlying task-specific function. To achieve this we rely on a Gaussian process obtained by treating the weights of the last layer of a neural network as random and Gaussian distributed. Then, the training algorithm sequentially encounters tasks and constructs posterior beliefs over the task-specific functions by using inducing point sparse Gaussian process methods. At each step a new task is first learnt and then a summary is constructed consisting of (i) inducing inputs and (ii) a posterior distribution over the function values at these inputs. This summary then regularises learning of future tasks, through Kullback-Leibler regularisation terms, so that catastrophic forgetting of earlier tasks is avoided. We demonstrate our algorithm in classification datasets, such as Split-MNIST, Permuted-MNIST and Omniglot.

Yee Whye Teh | Razvan Pascanu | Jonathan Schwarz | Michalis K. Titsias | Alexander G. de G. Matthews | Y. Teh | Razvan Pascanu | Jonathan Schwarz | A. G. D. G. Matthews | M. Titsias | A. G. Matthews

[1] Alexis Boukouvalas,et al. GPflow: A Gaussian Process Library using TensorFlow , 2016, J. Mach. Learn. Res..

[2] Mark B. Ring. Continual learning in reinforcement environments , 1995, GMD-Bericht.

[3] Marc'Aurelio Ranzato,et al. Gradient Episodic Memory for Continual Learning , 2017, NIPS.

[4] Neil D. Lawrence,et al. Kernels for Vector-Valued Functions: a Review , 2011, Found. Trends Mach. Learn..

[5] Joshua B. Tenenbaum,et al. One shot learning of simple visual concepts , 2011, CogSci.

[6] Matthias W. Seeger,et al. Using the Nyström Method to Speed Up Kernel Machines , 2000, NIPS.

[7] Neil D. Lawrence,et al. Gaussian Processes for Big Data , 2013, UAI.

[8] Edwin V. Bonilla,et al. Multi-task Gaussian Process Prediction , 2007, NIPS.

[9] James Hensman,et al. Scalable Variational Gaussian Process Classification , 2014, AISTATS.

[10] Jürgen Schmidhuber,et al. PowerPlay: Training an Increasingly General Problem Solver by Continually Searching for the Simplest Still Unsolvable Problem , 2011, Front. Psychol..

[11] Razvan Pascanu,et al. Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[12] Richard E. Turner,et al. Streaming Sparse Gaussian Process Approximations , 2017, NIPS.

[13] Michalis K. Titsias,et al. Variational Learning of Inducing Variables in Sparse Gaussian Processes , 2009, AISTATS.

[14] Zoubin Ghahramani,et al. Sparse Gaussian Processes using Pseudo-inputs , 2005, NIPS.

[15] Christoph H. Lampert,et al. iCaRL: Incremental Classifier and Representation Learning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16] Carl E. Rasmussen,et al. A Unifying View of Sparse Approximate Gaussian Process Regression , 2005, J. Mach. Learn. Res..

[17] Alexander J. Smola,et al. Laplace Propagation , 2003, NIPS.

[18] Neil D. Lawrence,et al. Fast Forward Selection to Speed Up Sparse Gaussian Process Regression , 2003, AISTATS.

[19] Roni Khardon,et al. Sparse Variational Inference for Generalized GP Models , 2015, ICML.

[20] Yoshua Bengio,et al. An Empirical Investigation of Catastrophic Forgeting in Gradient-Based Neural Networks , 2013, ICLR.

[21] Sergey Levine,et al. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[22] Stephen J. Roberts,et al. Variational Inference for Gaussian Process Modulated Poisson Processes , 2014, ICML.

[23] Jasper Snoek,et al. Deep Bayesian Bandits Showdown: An Empirical Comparison of Bayesian Deep Networks for Thompson Sampling , 2018, ICLR.

[24] Derek Hoiem,et al. Learning without Forgetting , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25] Neil D. Lawrence,et al. Fast Sparse Gaussian Process Methods: The Informative Vector Machine , 2002, NIPS.

[26] Andrew Gordon Wilson,et al. Deep Kernel Learning , 2015, AISTATS.

[27] Anthony V. Robins,et al. Catastrophic Forgetting and the Pseudorehearsal Solution in Hopfield-type Networks , 1998, Connect. Sci..

[28] Bartunov Sergey,et al. Meta-Learning with Memory-Augmented Neural Networks , 2016 .

[29] Richard E. Turner,et al. Variational Continual Learning , 2017, ICLR.

[30] M. Opper. Sparse Online Gaussian Processes , 2008 .

[31] Anthony V. Robins,et al. Catastrophic Forgetting, Rehearsal and Pseudorehearsal , 1995, Connect. Sci..

[32] Christopher K. I. Williams,et al. Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning) , 2005 .

[33] Edwin V. Bonilla,et al. Scalable Inference for Gaussian Process Models with Black-Box Likelihoods , 2015, NIPS.

[34] Yarin Gal,et al. Towards Robust Evaluations of Continual Learning , 2018, ArXiv.

[35] Richard E. Turner,et al. A Unifying Framework for Gaussian Process Pseudo-Point Approximations using Power Expectation Propagation , 2016, J. Mach. Learn. Res..

[36] Yee Whye Teh,et al. Progress & Compress: A scalable framework for continual learning , 2018, ICML.