论文信息 - Parallel Multiscale Autoregressive Density Estimation

Parallel Multiscale Autoregressive Density Estimation

PixelCNN achieves state-of-the-art results in density estimation for natural images. Although training is fast, inference is costly, requiring one network evaluation per pixel; O(N) for N pixels. This can be sped up by caching activations, but still involves generating each pixel sequentially. In this work, we propose a parallelized PixelCNN that allows more efficient inference by modeling certain pixel groups as conditionally independent. Our new PixelCNN model achieves competitive density estimation and orders of magnitude speedup - O(log N) sampling instead of O(N) - enabling the practical generation of 512x512 images. We evaluate the model on class-conditional image generation, text-to-image synthesis, and action-conditional video generation, showing that our model achieves the best results among non-pixel-autoregressive density models that allow efficient sampling.

[1] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[2] Lucas Theis,et al. Amortised MAP Inference for Image Super-resolution , 2016, ICLR.

[3] Samy Bengio,et al. Density estimation using Real NVP , 2016, ICLR.

[4] Matthias Bethge,et al. Generative Image Modeling Using Spatial LSTMs , 2015, NIPS.

[5] Bernt Schiele,et al. Learning What and Where to Draw , 2016, NIPS.

[6] Alex Graves,et al. Video Pixel Networks , 2016, ICML.

[7] Hugo Larochelle,et al. RNADE: The real-valued neural autoregressive density-estimator , 2013, NIPS.

[8] Bernt Schiele,et al. Generative Adversarial Text to Image Synthesis , 2016, ICML.

[9] Li Fei-Fei,et al. Perceptual Losses for Real-Time Style Transfer and Super-Resolution , 2016, ECCV.

[10] Bernt Schiele,et al. 2D Human Pose Estimation: New Benchmark and State of the Art Analysis , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[11] Heiga Zen,et al. WaveNet: A Generative Model for Raw Audio , 2016, SSW.

[12] Nando de Freitas,et al. Generating Interpretable Images with Controllable Structure , 2017 .

[13] Rob Fergus,et al. Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks , 2015, NIPS.

[14] Dimitris N. Metaxas,et al. StackGAN: Text to Photo-Realistic Image Synthesis with Stacked Generative Adversarial Networks , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[15] Mohammad Norouzi,et al. Pixel Recursive Super Resolution , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[16] Yoshua Bengio,et al. Plug & Play Generative Networks: Conditional Iterative Generation of Images in Latent Space , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17] Christian Ledig,et al. Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18] Xi Chen,et al. PixelCNN++: Improving the PixelCNN with Discretized Logistic Mixture Likelihood and Other Modifications , 2017, ICLR.

[19] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.

[20] David Vázquez,et al. PixelVAE: A Latent Variable Model for Natural Images , 2016, ICLR.

[21] Sergey Levine,et al. Unsupervised Learning for Physical Interaction through Video Prediction , 2016, NIPS.

[22] Alex Graves,et al. Conditional Image Generation with PixelCNN Decoders , 2016, NIPS.

[23] Daniel Rueckert,et al. Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24] Max Welling,et al. Improved Variational Inference with Inverse Autoregressive Flow , 2016, NIPS 2016.

[25] Hugo Larochelle,et al. The Neural Autoregressive Distribution Estimator , 2011, AISTATS.

[26] Thomas S. Huang,et al. Fast Generation for Convolutional Autoregressive Models , 2017, ICLR.

[27] Alex Graves,et al. Neural Machine Translation in Linear Time , 2016, ArXiv.

[28] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.

[29] Koray Kavukcuoglu,et al. Pixel Recurrent Neural Networks , 2016, ICML.

[30] Jian Sun,et al. Identity Mappings in Deep Residual Networks , 2016, ECCV.

[31] Ruslan Salakhutdinov,et al. Generating Images from Captions with Attention , 2015, ICLR.

[32] Abhinav Gupta,et al. Generative Image Modeling Using Style and Structure Adversarial Networks , 2016, ECCV.