BRUNO: A Deep Recurrent Model for Exchangeable Data

We present a novel model architecture which leverages deep learning tools to perform exact Bayesian inference on sets of high dimensional, complex observations. Our model is provably exchangeable, meaning that the joint distribution over observations is invariant under permutation: this property lies at the heart of Bayesian inference. The model does not require variational approximations to train, and new samples can be generated conditional on previous samples, with cost linear in the size of the conditioning set. The advantages of our architecture are demonstrated on learning tasks that require generalisation from short observed sequences while modelling sequence variability, such as conditional image generation, few-shot learning, and anomaly detection.

[1]  École d'été de probabilités de Saint-Flour,et al.  École d'été de probabilités de Saint-Flour XIII - 1983 , 1985 .

[2]  R. Bailey Polar generation of random variates with the t -distribution , 1994 .

[3]  Max L. Warshauer,et al.  Lecture Notes in Mathematics , 2001 .

[4]  Christopher K. I. Williams,et al.  Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning) , 2005 .

[5]  Yann LeCun,et al.  The mnist database of handwritten digits , 2005 .

[6]  Katherine A. Heller,et al.  Bayesian Sets , 2005, NIPS.

[7]  Katherine A. Heller,et al.  A Simple Bayesian Framework for Content-Based Image Retrieval , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[8]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[9]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[10]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[11]  Andrew Gordon Wilson,et al.  Student-t Processes as Alternatives to Gaussian Processes , 2014, AISTATS.

[12]  Shakir Mohamed,et al.  Variational Inference with Normalizing Flows , 2015, ICML.

[13]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[14]  Joshua B. Tenenbaum,et al.  Human-level concept learning through probabilistic program induction , 2015, Science.

[15]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[16]  Yoshua Bengio,et al.  NICE: Non-linear Independent Components Estimation , 2014, ICLR.

[17]  Matthias Bethge,et al.  A note on the evaluation of generative models , 2015, ICLR.

[18]  Sara van de Geer,et al.  Ecole d'été de probabilités de Saint-Flour XLV , 2016 .

[19]  Tim Salimans,et al.  Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks , 2016, NIPS.

[20]  Oriol Vinyals,et al.  Matching Networks for One Shot Learning , 2016, NIPS.

[21]  Samy Bengio,et al.  Order Matters: Sequence to sequence for sets , 2015, ICLR.

[22]  Arthur Gretton,et al.  Learning Theory for Distribution Regression , 2014, J. Mach. Learn. Res..

[23]  Sepp Hochreiter,et al.  Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.

[24]  Francis Comets,et al.  École d'été de probabilités de Saint-Flour XLVI , 2017 .

[25]  Samy Bengio,et al.  Density estimation using Real NVP , 2016, ICLR.

[26]  Amos J. Storkey,et al.  Towards a Neural Statistician , 2016, ICLR.

[27]  Iain Murray,et al.  Masked Autoregressive Flow for Density Estimation , 2017, NIPS.

[28]  Alexander J. Smola,et al.  Deep Sets , 2017, 1703.06114.

[29]  Roland Vollgraf,et al.  Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.

[30]  Zhenguo Li,et al.  Federated Meta-Learning with Fast Convergence and Efficient Communication , 2018, 1802.07876.

[31]  Zhenguo Li,et al.  Federated Meta-Learning for Recommendation , 2018, ArXiv.