Attentive Neural Processes

Neural Processes (NPs) (Garnelo et al 2018a;b) approach regression by learning to map a context set of observed input-output pairs to a distribution over regression functions. Each function models the distribution of the output given an input, conditioned on the context. NPs have the benefit of fitting observed data efficiently with linear complexity in the number of context input-output pairs, and can learn a wide family of conditional distributions; they learn predictive distributions conditioned on context sets of arbitrary size. Nonetheless, we show that NPs suffer a fundamental drawback of underfitting, giving inaccurate predictions at the inputs of the observed data they condition on. We address this issue by incorporating attention into NPs, allowing each input location to attend to the relevant context points for the prediction. We show that this greatly improves the accuracy of predictions, results in noticeably faster training, and expands the range of functions that can be modelled.

[1]  Geoffrey E. Hinton,et al.  The "wake-sleep" algorithm for unsupervised neural networks. , 1995, Science.

[2]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[3]  Yee Whye Teh,et al.  Semiparametric latent factor models , 2005, AISTATS.

[4]  Edwin V. Bonilla,et al.  Multi-task Gaussian Process Prediction , 2007, NIPS.

[5]  Alex Graves,et al.  Supervised Sequence Labelling with Recurrent Neural Networks , 2012, Studies in Computational Intelligence.

[6]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[7]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[8]  Xiaogang Wang,et al.  Deep Learning Face Attributes in the Wild , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[9]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[10]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[11]  Daan Wierstra,et al.  One-Shot Generalization in Deep Generative Models , 2016, ICML.

[12]  Andrew Gordon Wilson,et al.  Deep Kernel Learning , 2015, AISTATS.

[13]  Daan Wierstra,et al.  One-shot Learning with Memory-Augmented Neural Networks , 2016, ArXiv.

[14]  Oriol Vinyals,et al.  Matching Networks for One Shot Learning , 2016, NIPS.

[15]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[16]  Amos J. Storkey,et al.  Towards a Neural Statistician , 2016, ICLR.

[17]  Neil D. Lawrence,et al.  Efficient Modeling of Latent Information in Supervised Learning using Gaussian Processes , 2017, NIPS.

[18]  Dmitry P. Vetrov,et al.  Fast Adaptation in Generative Models with Generative Matching Networks , 2016, ICLR.

[19]  Richard S. Zemel,et al.  Prototypical Networks for Few-shot Learning , 2017, NIPS.

[20]  Jörg Bornschein,et al.  Variational Memory Addressing in Generative Models , 2017, NIPS.

[21]  Philip Bachman,et al.  VFunc: a Deep Generative Model for Functions , 2018, ArXiv.

[22]  Joshua B. Tenenbaum,et al.  The Variational Homoencoder: Learning to learn high capacity generative models from few examples , 2018, UAI.

[23]  Pieter Abbeel,et al.  A Simple Neural Attentive Meta-Learner , 2017, ICLR.

[24]  Fabio Viola,et al.  Learning models for visual 3D localization with implicit mapping , 2018, ArXiv.

[25]  Yee Whye Teh,et al.  Conditional Neural Processes , 2018, ICML.

[26]  Thomas Paine,et al.  Few-shot Autoregressive Density Estimation: Towards Learning to Learn Distributions , 2017, ICLR.

[27]  Koray Kavukcuoglu,et al.  Neural scene representation and rendering , 2018, Science.

[28]  Murray Shanahan,et al.  Consistent Generative Query Networks , 2018, ArXiv.

[29]  José Miguel Hernández-Lobato,et al.  Variational Implicit Processes , 2018, ICML.