Adversarial Training of Variational Auto-Encoders for High Fidelity Image Generation

Variational auto-encoders (VAEs) provide an attractive solution to image generation problem. However, they tend to produce blurred and over-smoothed images due to their dependence on pixel-wise reconstruction loss. This paper introduces a new approach to alleviate this problem in the VAE based generative models. Our model simultaneously learns to match the data, reconstruction loss and the latent distributions of real and fake images to improve the quality of generated samples. To compute the loss distributions, we introduce an auto-encoder based discriminator model which allows an adversarial learning procedure. The discriminator in our model also provides perceptual guidance to the VAE by matching the learned similarity metric of the real and fake samples in the latent space. To stabilize the overall training process, our model uses an error feedback approach to maintain the equilibrium between competing networks in the model. Our experiments show that the generated samples from our proposed model exhibit a diverse set of attributes and facial expressions and scale up to highresolution images very well.

[1]  Mohammed Bennamoun,et al.  Cost-Sensitive Learning of Deep Feature Representations From Imbalanced Data , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[2]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[3]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[4]  Zhe Gan,et al.  Adversarial Symmetric Variational Autoencoder , 2017, NIPS.

[5]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[6]  Aaron C. Courville,et al.  Adversarially Learned Inference , 2016, ICLR.

[7]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[8]  Léon Bottou,et al.  Towards Principled Methods for Training Generative Adversarial Networks , 2017, ICLR.

[9]  Xiaogang Wang,et al.  Deep Learning Face Attributes in the Wild , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[10]  Wojciech Zaremba,et al.  Improved Techniques for Training GANs , 2016, NIPS.

[11]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Ole Winther,et al.  Autoencoding beyond pixels using a learned similarity metric , 2015, ICML.

[13]  Jonathon Shlens,et al.  Conditional Image Synthesis with Auxiliary Classifier GANs , 2016, ICML.

[14]  Alexei A. Efros,et al.  Texture synthesis by non-parametric sampling , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[15]  Christian Ledig,et al.  Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Léon Bottou,et al.  Wasserstein GAN , 2017, ArXiv.

[17]  Yoshua Bengio,et al.  Boundary-Seeking Generative Adversarial Networks , 2017, ICLR 2017.

[18]  Yann LeCun,et al.  Energy-based Generative Adversarial Network , 2016, ICLR.

[19]  Yingyu Liang,et al.  Generalization and Equilibrium in Generative Adversarial Nets (GANs) , 2017, ICML.

[20]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[21]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[22]  David Berthelot,et al.  BEGAN: Boundary Equilibrium Generative Adversarial Networks , 2017, ArXiv.

[23]  Geoffrey E. Hinton,et al.  Learning and relearning in Boltzmann machines , 1986 .

[24]  Mohammed Bennamoun,et al.  A Spatial Layout and Scale Invariant Feature Representation for Indoor Scene Classification , 2015, IEEE Transactions on Image Processing.

[25]  Alex Graves,et al.  Conditional Image Generation with PixelCNN Decoders , 2016, NIPS.

[26]  Minh N. Do,et al.  Semantic Image Inpainting with Perceptual and Contextual Losses , 2016, ArXiv.

[27]  Fatih Murat Porikli,et al.  Scene Categorization with Spectral Features , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[28]  Alexei A. Efros,et al.  Scene completion using millions of photographs , 2007, SIGGRAPH 2007.

[29]  Minh N. Do,et al.  Semantic Image Inpainting with Deep Generative Models , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Graham W. Taylor,et al.  Deconvolutional networks , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[31]  Xi Chen,et al.  PixelCNN++: Improving the PixelCNN with Discretized Logistic Mixture Likelihood and Other Modifications , 2017, ICLR.

[32]  Mohammed Bennamoun,et al.  Empowering Simple Binary Classifiers for Image Set Based Face Recognition , 2017, International Journal of Computer Vision.

[33]  Antonio Torralba,et al.  Generating Videos with Scene Dynamics , 2016, NIPS.

[34]  Mohammed Bennamoun,et al.  A Discriminative Representation of Convolutional Features for Indoor Scene Recognition , 2015, IEEE transactions on image processing : a publication of the IEEE Signal Processing Society.