One-Shot Generalization in Deep Generative Models

Humans have an impressive ability to reason about new concepts and experiences from just a single example. In particular, humans have an ability for one-shot generalization: an ability to encounter a new concept, understand its structure, and then be able to generate compelling alternative variations of the concept. We develop machine learning systems with this important capacity by developing new deep generative models, models that combine the representational power of deep learning with the inferential power of Bayesian reasoning. We develop a class of sequential generative models that are built on the principles of feedback and attention. These two characteristics lead to generative models that are among the state-of-the art in density estimation and image generation. We demonstrate the one-shot generalization ability of our models using three tasks: unconditional sampling, generating new exemplars of a given concept, and generating new exemplars of a family of concepts. In all cases our models are able to generate compelling and diverse samples-- having seen new examples just once--providing an important class of general-purpose models for one-shot machine learning.

[1]  Michael I. Jordan,et al.  Mean Field Theory for Sigmoid Belief Networks , 1996, J. Artif. Intell. Res..

[2]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[3]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[4]  A. Yuille,et al.  Vision as Bayesian inference: analysis by synthesis? , 2006, Trends in Cognitive Sciences.

[5]  Ruslan Salakhutdinov,et al.  On the quantitative analysis of deep belief networks , 2008, ICML '08.

[6]  Geoffrey E. Hinton,et al.  Analysis-by-Synthesis by Learning to Invert Generative Black Boxes , 2008, ICANN.

[7]  Takeo Kanade,et al.  Multi-PIE , 2008, 2008 8th IEEE International Conference on Automatic Face & Gesture Recognition.

[8]  Geoffrey E. Hinton,et al.  Learning to combine foveal glimpses with a third-order Boltzmann machine , 2010, NIPS.

[9]  T. Poggio,et al.  What and where: A Bayesian inference theory of attention , 2010, Vision Research.

[10]  Andrew Y. Ng,et al.  Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[11]  Joshua B. Tenenbaum,et al.  Learning with Hierarchical-Deep Models , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Karol Gregor,et al.  Neural Variational Inference and Learning in Belief Networks , 2014, ICML.

[13]  Nitish Srivastava,et al.  Learning Generative Models with Visual Attention , 2013, NIPS.

[14]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[15]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2014, ICLR.

[16]  Daan Wierstra,et al.  Deep AutoRegressive Networks , 2014, ICML.

[17]  Sean Gerrish,et al.  Black Box Variational Inference , 2013, AISTATS.

[18]  Robert A. Jacobs,et al.  A Concept Learning Approach to Multisensory Object Perception , 2014, ArXiv.

[19]  Alex Graves,et al.  Recurrent Models of Visual Attention , 2014, NIPS.

[20]  Shakir Mohamed,et al.  Variational Inference with Normalizing Flows , 2015, ICML.

[21]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[22]  Brendan J. Frey,et al.  Learning Wake-Sleep Recurrent Attention Models , 2015, NIPS.

[23]  Kevin Murphy,et al.  Efficient inference in occlusion-aware generative models of images , 2015, ArXiv.

[24]  Joshua B. Tenenbaum,et al.  Human-level concept learning through probabilistic program induction , 2015, Science.

[25]  Alex Graves,et al.  DRAW: A Recurrent Neural Network For Image Generation , 2015, ICML.

[26]  Goker Erdogan An Analysis-By-Synthesis Approach to Multisensory Object Shape Perception , 2015 .

[27]  Andrew Zisserman,et al.  Spatial Transformer Networks , 2015, NIPS.

[28]  Ruslan Salakhutdinov,et al.  Generating Images from Captions with Attention , 2016, ICLR.

[29]  Geoffrey E. Hinton,et al.  Attend, Infer, Repeat: Fast Scene Understanding with Generative Models , 2016, NIPS.

[30]  Ruslan Salakhutdinov,et al.  Importance Weighted Autoencoders , 2015, ICLR.

[31]  Lukasz Kaiser,et al.  Neural GPUs Learn Algorithms , 2015, ICLR.