The Functional Neural Process

We present a new family of exchangeable stochastic processes, the Functional Neural Processes (FNPs). FNPs model distributions over functions by learning a graph of dependencies on top of latent representations of the points in the given dataset. In doing so, they define a Bayesian model without explicitly positing a prior distribution over latent global parameters; they instead adopt priors over the relational structure of the given dataset, a task that is much simpler. We show how we can learn such models from data, demonstrate that they are scalable to large datasets through mini-batch optimization and describe how we can make predictions for new points via their posterior predictive distribution. We experimentally evaluate FNPs on the tasks of toy regression and image classification and show that, when compared to baselines that employ global latent parameters, they offer both competitive predictions as well as more robust uncertainty estimates.

[1]  Dustin Tran,et al.  Reliable Uncertainty Estimates in Deep Neural Networks using Noise Contrastive Priors , 2018, ArXiv.

[2]  Carl E. Rasmussen,et al.  Convolutional Gaussian Processes , 2017, NIPS.

[3]  Gregory R. Koch,et al.  Siamese Neural Networks for One-Shot Image Recognition , 2015 .

[4]  Andrew Gordon Wilson,et al.  Deep Kernel Learning , 2015, AISTATS.

[5]  Arno Solin,et al.  Variational Fourier Features for Gaussian Processes , 2016, J. Mach. Learn. Res..

[6]  Alexander A. Alemi,et al.  Deep Variational Information Bottleneck , 2017, ICLR.

[7]  Max Welling,et al.  Bayesian Compression for Deep Learning , 2017, NIPS.

[8]  Nicholas Ruozzi,et al.  Correlated Variational Auto-Encoders , 2019, ICML.

[9]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[10]  Michalis K. Titsias,et al.  Variational Learning of Inducing Variables in Sparse Gaussian Processes , 2009, AISTATS.

[11]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[12]  Eric Jang,et al.  Generative Ensembles for Robust Anomaly Detection , 2018, ArXiv.

[13]  Yee Whye Teh,et al.  The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables , 2016, ICLR.

[14]  David Mackay,et al.  Probable networks and plausible predictions - a review of practical Bayesian methods for supervised neural networks , 1995 .

[15]  Arthur Jacot,et al.  Neural tangent kernel: convergence and generalization in neural networks (invited paper) , 2018, NeurIPS.

[16]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[17]  Yee Whye Teh,et al.  Do Deep Generative Models Know What They Don't Know? , 2018, ICLR.

[18]  Daniel M. Roy,et al.  Bayesian Models of Graphs, Arrays and Other Exchangeable Random Structures , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Julien Cornebise,et al.  Weight Uncertainty in Neural Networks , 2015, ArXiv.

[20]  Pieter Abbeel,et al.  Variational Lossy Autoencoder , 2016, ICLR.

[21]  Jaehoon Lee,et al.  Bayesian Deep Convolutional Networks with Many Channels are Gaussian Processes , 2018, ICLR.

[22]  Max Welling,et al.  Structured and Efficient Variational Deep Learning with Matrix Gaussian Posteriors , 2016, ICML.

[23]  Jun Zhu,et al.  Kernel Implicit Variational Inference , 2017, ICLR.

[24]  Benjamin Van Roy,et al.  Deep Exploration via Bootstrapped DQN , 2016, NIPS.

[25]  Alex Graves,et al.  Associative Compression Networks for Representation Learning , 2018, ArXiv.

[26]  Ryan P. Adams,et al.  Probabilistic Backpropagation for Scalable Learning of Bayesian Neural Networks , 2015, ICML.

[27]  Achim Klenke,et al.  Probability theory - a comprehensive course , 2008, Universitext.

[28]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[29]  Yee Whye Teh,et al.  Attentive Neural Processes , 2019, ICLR.

[30]  Ariel D. Procaccia,et al.  Variational Dropout and the Local Reparameterization Trick , 2015, NIPS.

[31]  Julien Cornebise,et al.  Weight Uncertainty in Neural Network , 2015, ICML.

[32]  José Miguel Hernández-Lobato,et al.  Variational Implicit Processes , 2018, ICML.

[33]  Jaehoon Lee,et al.  Wide neural networks of any depth evolve as linear models under gradient descent , 2019, NeurIPS.

[34]  Neil D. Lawrence,et al.  Recurrent Gaussian Processes , 2015, ICLR.

[35]  Patrick Forré,et al.  Reparameterizing Distributions on Lie Groups , 2019, AISTATS.

[36]  Max Welling,et al.  Multiplicative Normalizing Flows for Variational Bayesian Neural Networks , 2017, ICML.

[37]  Oriol Vinyals,et al.  Matching Networks for One Shot Learning , 2016, NIPS.

[38]  Alex Graves,et al.  Practical Variational Inference for Neural Networks , 2011, NIPS.

[39]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[40]  Dmitry P. Vetrov,et al.  Few-shot Generative Modelling with Generative Matching Networks , 2018, AISTATS.

[41]  Neil D. Lawrence,et al.  Deep Gaussian Processes , 2012, AISTATS.

[42]  Max Welling,et al.  The Deep Weight Prior. Modeling a prior distribution for CNNs using generative models , 2018, ArXiv.

[43]  Andrew Gordon Wilson,et al.  Stochastic Variational Deep Kernel Learning , 2016, NIPS.

[44]  Guodong Zhang,et al.  Eigenvalue Corrected Noisy Natural Gradient , 2018, ArXiv.

[45]  Alex Graves,et al.  Associative Compression Networks , 2018 .

[46]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[47]  Ben Poole,et al.  Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.

[48]  Zoubin Ghahramani,et al.  Sparse Gaussian Processes using Pseudo-inputs , 2005, NIPS.

[49]  Stochastic Orders , 2008 .

[50]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[51]  Tao Xiang,et al.  Learning to Compare: Relation Network for Few-Shot Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[52]  Alexander A. Alemi,et al.  WAIC, but Why? Generative Ensembles for Robust Anomaly Detection , 2018 .

[53]  Maurizio Filippone,et al.  Random Feature Expansions for Deep Gaussian Processes , 2016, ICML.

[54]  Guodong Zhang,et al.  Functional Variational Bayesian Neural Networks , 2019, ICLR.

[55]  Max Welling,et al.  The Deep Weight Prior , 2018, ICLR.

[56]  Cameron E. Freer,et al.  Priors on exchangeable directed graphs , 2015, 1510.08440.

[57]  Guodong Zhang,et al.  Noisy Natural Gradient as Variational Inference , 2017, ICML.

[58]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[59]  R. Srikant,et al.  Enhancing The Reliability of Out-of-distribution Image Detection in Neural Networks , 2017, ICLR.

[60]  Richard S. Zemel,et al.  Prototypical Networks for Few-shot Learning , 2017, NIPS.

[61]  Sudipto Banerjee,et al.  Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geostatistical Datasets , 2014, Journal of the American Statistical Association.