Deep Mean Functions for Meta-Learning in Gaussian Processes

Fitting machine learning models in the low-data limit is challenging. The main challenge is to obtain suitable prior knowledge and encode it into the model, for instance in the form of a Gaussian process prior. Recent advances in meta-learning offer powerful methods for extracting such prior knowledge from data acquired in related tasks. When it comes to meta-learning in Gaussian process models, approaches in this setting have mostly focused on learning the kernel function of the prior, but not on learning its mean function. In this work, we propose to parameterize the mean function of a Gaussian process with a deep neural network and train it with a meta-learning procedure. We present analytical and empirical evidence that mean function learning can be superior to kernel learning alone, particularly if data is scarce.

[1]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[2]  Andrew Gordon Wilson,et al.  Stochastic Variational Deep Kernel Learning , 2016, NIPS.

[3]  Edwin V. Bonilla,et al.  Multi-task Gaussian Process Prediction , 2007, NIPS.

[4]  Grigorios Skolidis,et al.  Transfer learning with Gaussian processes , 2012 .

[5]  G. Moody,et al.  Predicting in-hospital mortality of ICU patients: The PhysioNet/Computing in cardiology challenge 2012 , 2012, 2012 Computing in Cardiology.

[6]  Marco Pavone,et al.  Meta-Learning Priors for Efficient Online Bayesian Regression , 2018, WAFR.

[7]  Yarin Gal,et al.  Uncertainty in Deep Learning , 2016 .

[8]  Jonathan Baxter,et al.  A Model of Inductive Bias Learning , 2000, J. Artif. Intell. Res..

[9]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[10]  Ricardo Vilalta,et al.  A Perspective View and Survey of Meta-Learning , 2002, Artificial Intelligence Review.

[11]  Carl E. Rasmussen,et al.  A Unifying View of Sparse Approximate Gaussian Process Regression , 2005, J. Mach. Learn. Res..

[12]  Loic Le Gratiet,et al.  Bayesian Analysis of Hierarchical Multifidelity Codes , 2011, SIAM/ASA J. Uncertain. Quantification.

[13]  John C. Platt,et al.  Learning a Gaussian Process Prior for Automatically Generating Music Playlists , 2001, NIPS.

[14]  Yoshua Bengio,et al.  Deep Learning of Representations for Unsupervised and Transfer Learning , 2011, ICML Unsupervised and Transfer Learning.

[15]  Ronald N. Bracewell,et al.  The Fourier Transform and Its Applications , 1966 .

[16]  Yuesheng Xu,et al.  Universal Kernels , 2006, J. Mach. Learn. Res..

[17]  Yee Whye Teh,et al.  Conditional Neural Processes , 2018, ICML.

[18]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[19]  Sebastian Thrun,et al.  Learning to Learn , 1998, Springer US.

[20]  Alex Smola,et al.  Kernel methods in machine learning , 2007, math/0701907.

[21]  Shun-ichi Amari,et al.  Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.

[22]  Michalis K. Titsias,et al.  Variational Learning of Inducing Variables in Sparse Gaussian Processes , 2009, AISTATS.

[23]  Tomoharu Iwata,et al.  Improving Output Uncertainty Estimation and Generalization in Deep Learning via Neural Network Gaussian Processes , 2017, 1707.05922.

[24]  Gunnar Rätsch,et al.  Inferring latent task structure for Multitask Learning by Multiple Kernel Learning , 2010, BMC Bioinformatics.

[25]  K. W. Cattermole The Fourier Transform and its Applications , 1965 .

[26]  Andrew Gordon Wilson,et al.  GPyTorch: Blackbox Matrix-Matrix Gaussian Process Inference with GPU Acceleration , 2018, NeurIPS.

[27]  Andrew Gordon Wilson,et al.  Deep Kernel Learning , 2015, AISTATS.

[28]  Bradley P. Carlin,et al.  Bayesian Methods for Data Analysis , 2008 .

[29]  Charles A. Micchelli,et al.  Learning the Kernel Function via Regularization , 2005, J. Mach. Learn. Res..

[30]  Daniele Venturi,et al.  Multifidelity Information Fusion Algorithms for High-Dimensional Systems and Massive Data sets , 2016, SIAM J. Sci. Comput..

[31]  Thomas L. Griffiths,et al.  Recasting Gradient-Based Meta-Learning as Hierarchical Bayes , 2018, ICLR.

[32]  J. Mercer Functions of positive and negative type, and their connection with the theory of integral equations , 1909 .

[33]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[34]  Anton Schwaighofer,et al.  Learning Gaussian processes from multiple tasks , 2005, ICML.

[35]  Alexis Boukouvalas,et al.  GPflow: A Gaussian Process Library using TensorFlow , 2016, J. Mach. Learn. Res..

[36]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[37]  Gavin C. Cawley,et al.  Preventing Over-Fitting during Model Selection via Bayesian Regularisation of the Hyper-Parameters , 2007, J. Mach. Learn. Res..

[38]  Daniel McNeish,et al.  On Using Bayesian Methods to Address Small Sample Problems , 2016 .

[39]  Gunnar Rätsch,et al.  Sparse Gaussian Processes on Discrete Domains , 2018, IEEE Access.

[40]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[41]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .