Conditional Image Generation with PixelCNN Decoders

This work explores conditional image generation with a new image density model based on the PixelCNN architecture. The model can be conditioned on any vector, including descriptive labels or tags, or latent embeddings created by other networks. When conditioned on class labels from the ImageNet database, the model is able to generate diverse, realistic scenes representing distinct animals, objects, landscapes and structures. When conditioned on an embedding produced by a convolutional network given a single image of an unseen face, it generates a variety of new portraits of the same person with different facial expressions, poses and lighting conditions. We also show that conditional PixelCNN can serve as a powerful decoder in an image autoencoder. Additionally, the gated convolutional layers in the proposed model improve the log-likelihood of PixelCNN to match the state-of-the-art performance of PixelRNN on ImageNet, with greatly reduced computational cost.

[1]  T. Munich,et al.  Offline Handwriting Recognition with Multidimensional Recurrent Neural Networks , 2008, NIPS.

[2]  Jürgen Schmidhuber,et al.  Stacked Convolutional Auto-Encoders for Hierarchical Feature Extraction , 2011, ICANN.

[3]  Hugo Larochelle,et al.  The Neural Autoregressive Distribution Estimator , 2011, AISTATS.

[4]  Joshua B. Tenenbaum,et al.  Learning with Hierarchical-Deep Models , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Benjamin Schrauwen,et al.  Factoring Variations in Natural Images with Deep Gaussian Mixture Models , 2014, NIPS.

[6]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[7]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[8]  Daan Wierstra,et al.  Deep AutoRegressive Networks , 2013, ICML.

[9]  Benjamin Schrauwen,et al.  The student-t mixture as a natural image patch prior with application to image compression , 2014, J. Mach. Learn. Res..

[10]  Surya Ganguli,et al.  Deep Unsupervised Learning using Nonequilibrium Thermodynamics , 2015, ICML.

[11]  Rob Fergus,et al.  Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks , 2015, NIPS.

[12]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Alex Graves,et al.  DRAW: A Recurrent Neural Network For Image Generation , 2015, ICML.

[14]  Alexander Mordvintsev,et al.  Inceptionism: Going Deeper into Neural Networks , 2015 .

[15]  Jürgen Schmidhuber,et al.  Training Very Deep Networks , 2015, NIPS.

[16]  Yoshua Bengio,et al.  NICE: Non-linear Independent Components Estimation , 2014, ICLR.

[17]  Aäron van den Oord,et al.  Locally-connected transformations for deep GMMs , 2015, ICML 2015.

[18]  Honglak Lee,et al.  Action-Conditional Video Prediction using Deep Networks in Atari Games , 2015, NIPS.

[19]  Matthias Bethge,et al.  Generative Image Modeling Using Spatial LSTMs , 2015, NIPS.

[20]  Leon A. Gatys,et al.  A Neural Algorithm of Artistic Style , 2015, ArXiv.

[21]  Daan Wierstra,et al.  One-Shot Generalization in Deep Generative Models , 2016, ICML.

[22]  Ruslan Salakhutdinov,et al.  Generating Images from Captions with Attention , 2015, ICLR.

[23]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Matthias Bethge,et al.  A note on the evaluation of generative models , 2015, ICLR.

[25]  Hugo Larochelle,et al.  Neural Autoregressive Distribution Estimation , 2016, J. Mach. Learn. Res..

[26]  Koray Kavukcuoglu,et al.  Pixel Recurrent Neural Networks , 2016, ICML.

[27]  Alex Graves,et al.  Grid Long Short-Term Memory , 2015, ICLR.

[28]  Lukasz Kaiser,et al.  Neural GPUs Learn Algorithms , 2015, ICLR.

[29]  Bernt Schiele,et al.  Generative Adversarial Text to Image Synthesis , 2016, ICML.

[30]  Tom Schaul,et al.  Unifying Count-Based Exploration and Intrinsic Motivation , 2016, NIPS.

[31]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[32]  Daan Wierstra,et al.  Towards Conceptual Compression , 2016, NIPS.

[33]  Frank Gabel Generative Adversarial Text-to-Image Synthesis , 2018 .