Non-Gaussian Gaussian Processes for Few-Shot Regression

Gaussian Processes (GPs) have been widely used in machine learning to model distributions over functions, with applications including multi-modal regression, time-series prediction, and few-shot learning. GPs are particularly useful in the last application since they rely on Normal distributions and enable closed-form computation of the posterior probability function. Unfortunately, because the resulting posterior is not flexible enough to capture complex distributions, GPs assume high similarity between subsequent tasks – a requirement rarely met in real-world conditions. In this work, we address this limitation by leveraging the flexibility of Normalizing Flows to modulate the posterior predictive distribution of the GP. This makes the GP posterior locally non-Gaussian, therefore we name our method Non-Gaussian Gaussian Processes (NGGPs). We propose an invertible ODE-based mapping that operates on each component of the random variable vectors and shares the parameters across all of them. We empirically tested the flexibility of NGGPs on various few-shot learning regression datasets, showing that the mapping can incorporate context embedding information to model different noise levels for periodic functions. As a result, our method shares the structure of the problem between subsequent tasks, but the contextualization allows for adaptation to dissimilarities. NGGPs outperform the competing state-of-the-art approaches on a diversified set of benchmarks and applications.

[1]  Miguel Lázaro-Gredilla,et al.  Bayesian Warped Gaussian Processes , 2012, NIPS.

[2]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[3]  Julien Cornebise,et al.  Weight Uncertainty in Neural Network , 2015, ICML.

[4]  Jake Snell,et al.  Bayesian Few-Shot Classification with One-vs-Each Pólya-Gamma Augmented Gaussian Processes , 2020, ArXiv.

[5]  David Duvenaud,et al.  FFJORD: Free-form Continuous Dynamics for Scalable Reversible Generative Models , 2018, ICLR.

[6]  Carl E. Rasmussen,et al.  Warped Gaussian Processes , 2003, NIPS.

[7]  Oriol Vinyals,et al.  Matching Networks for One Shot Learning , 2016, NIPS.

[8]  Sebastian Nowozin,et al.  Meta-Learning Probabilistic Inference for Prediction , 2018, ICLR.

[9]  Carl E. Rasmussen,et al.  A Unifying View of Sparse Approximate Gaussian Process Regression , 2005, J. Mach. Learn. Res..

[10]  Amine Bermak,et al.  Gaussian process for nonstationary time series prediction , 2004, Comput. Stat. Data Anal..

[11]  James T. Kwok,et al.  Generalizing from a Few Examples , 2019, ACM Comput. Surv..

[12]  Marco Pavone,et al.  Meta-Learning Priors for Efficient Online Bayesian Regression , 2018, WAFR.

[13]  Carl E. Rasmussen,et al.  Deep Structured Mixtures of Gaussian Processes , 2019, AISTATS.

[14]  James Hensman,et al.  Gaussian Process Conditional Density Estimation , 2018, NeurIPS.

[15]  SMFernandez Fraga,et al.  SCREEN TASK EXPERIMENTS FOR EEG SIGNALS BASED ON SSVEP BRAIN COMPUTER INTERFACE. , 2018 .

[16]  Ling Shao,et al.  MetaKernel: Learning Variational Random Features With Limited Labels , 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Thomas L. Griffiths,et al.  Reconciling meta-learning and continual learning with online mixtures of tasks , 2018, NeurIPS.

[18]  Tapani Raiko,et al.  International Conference on Learning Representations (ICLR) , 2016 .

[19]  Yu-Chiang Frank Wang,et al.  A Closer Look at Few-shot Classification , 2019, ICLR.

[20]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[21]  Tao Xiang,et al.  Learning to Compare: Relation Network for Few-Shot Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[22]  Shakir Mohamed,et al.  Variational Inference with Normalizing Flows , 2015, ICML.

[23]  Christopher K. I. Williams,et al.  Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning) , 2005 .

[24]  Samy Bengio,et al.  Density estimation using Real NVP , 2016, ICLR.

[25]  Yuval Tassa,et al.  MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[26]  Thomas L. Griffiths,et al.  Recasting Gradient-Based Meta-Learning as Hierarchical Bayes , 2018, ICLR.

[27]  Andreas Krause,et al.  PACOH: Bayes-Optimal Meta-Learning with PAC-Guarantees , 2020, ICML.

[28]  Richard S. Zemel,et al.  Prototypical Networks for Few-shot Learning , 2017, NIPS.

[29]  Shaogang Gong,et al.  An investigation into face pose distributions , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[30]  Neil D. Lawrence,et al.  Modelling transcriptional regulation using Gaussian Processes , 2006, NIPS.

[31]  Elliot J. Crowley,et al.  Bayesian Meta-Learning for the Few-Shot Setting via Deep Kernels , 2020, NeurIPS.

[32]  Gustau Camps-Valls,et al.  Kernel Dependence Regularizers and Gaussian Processes with Applications to Algorithmic Fairness , 2019, Pattern Recognit..

[33]  Prafulla Dhariwal,et al.  Glow: Generative Flow with Invertible 1x1 Convolutions , 2018, NeurIPS.

[34]  Alexandre Galashov,et al.  Information Theoretic Meta Learning with Gaussian Processes , 2020, ArXiv.

[35]  Yu Zhang,et al.  Learning Inverse Dynamics by Gaussian process Begrression under the Multi-Task Learning Framework , 2009 .

[36]  Theodoros Damoulas,et al.  Transforming Gaussian Processes With Normalizing Flows , 2020, ArXiv.

[37]  Sergey Levine,et al.  Probabilistic Model-Agnostic Meta-Learning , 2018, NeurIPS.

[38]  Chelsea Finn,et al.  Meta-Learning without Memorization , 2020, ICLR.

[39]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[40]  Anders Krogh,et al.  A Simple Weight Decay Can Improve Generalization , 1991, NIPS.

[41]  Eduardo Alonso,et al.  Gaussian Process Regression for Probabilistic Short-term Solar Output Forecast , 2020, ArXiv.

[42]  Yoshua Bengio,et al.  Bayesian Model-Agnostic Meta-Learning , 2018, NeurIPS.

[43]  Silvio Savarese,et al.  Beyond PASCAL: A benchmark for 3D object detection in the wild , 2014, IEEE Winter Conference on Applications of Computer Vision.

[44]  Sergey Levine,et al.  Meta-Learning with Implicit Gradients , 2019, NeurIPS.

[45]  Andrew Gordon Wilson,et al.  Gaussian Process Kernels for Pattern Discovery and Extrapolation , 2013, ICML.

[46]  Alexandre Lacoste,et al.  Adaptive Deep Kernel Learning , 2019, ArXiv.

[47]  Garrison W. Cottrell,et al.  A Dual-Stage Attention-Based Recurrent Neural Network for Time Series Prediction , 2017, IJCAI.

[48]  Yee Whye Teh,et al.  MetaFun: Meta-Learning with Iterative Functional Updates , 2020, ICML.