Appearance and shape based image synthesis by conditional variational generative adversarial network

Abstract Person image synthesis based on shape and appearance using deep generative models opens the door in mickle applications, such as person re-identification (ReID) and movie industry. The methods of image synthesis are driven by producing the image of an object directly, which fail to recover spatial deformations when images are generated. In this paper, we present a conditional variational generative adversarial network (CVGAN) to synthesize desired images guided by target shape by modeling the inherent interplay between shape and appearance. Firstly, the shape and appearance of the given images are disentangled by adopting variational inference, which enables us to generate person images with arbitrary shapes. Secondly, to preserve the details and generate photo-realistic images, the Kullback–Leibler (KL) loss is adopted to reduce the gap between the condition image and generated image. Thirdly, to prevent partly gradient vanishing problem for training our framework stably, we propose combined general learning method, where the discriminative network leverages least squares loss. In addition, we experiment on COCO, DeepFashion and Market-1501 datasets, and results demonstrate that VGAN significantly improves the synthesis of images on discriminability, diversity and quality over the existing methods.

[1]  Björn Ommer,et al.  A Variational U-Net for Conditional Appearance and Shape Generation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[2]  Koray Kavukcuoglu,et al.  Pixel Recurrent Neural Networks , 2016, ICML.

[3]  Björn Ommer,et al.  LSTM Self-Supervision for Detailed Behavior Analysis , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Miao Yu,et al.  Progressive Pose Attention Transfer for Person Image Generation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Sepp Hochreiter,et al.  GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[6]  Bernt Schiele,et al.  Learning What and Where to Draw , 2016, NIPS.

[7]  Nicu Sebe,et al.  Attention-based Fusion for Multi-source Human Image Generation , 2019, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).

[8]  Qi Tian,et al.  Scalable Person Re-identification: A Benchmark , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[9]  Bo Zhao,et al.  Multi-View Image Generation from a Single-View , 2017, ACM Multimedia.

[10]  Frank Hutter,et al.  Neural Architecture Search: A Survey , 2018, J. Mach. Learn. Res..

[11]  Hanjiang Lai,et al.  Soft-Gated Warping-GAN for Pose-Guided Person Image Synthesis , 2018, NeurIPS.

[12]  Raymond Y. K. Lau,et al.  Least Squares Generative Adversarial Networks , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[13]  Han Zhang,et al.  Self-Attention Generative Adversarial Networks , 2018, ICML.

[14]  Björn Ommer,et al.  Unsupervised Video Understanding by Reconciliation of Posture Similarities , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[15]  Jianwei Zhao,et al.  Image super-resolution via adaptive sparse representation , 2017, Knowl. Based Syst..

[16]  Christian Ledig,et al.  Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[18]  Gang Hua,et al.  CVAE-GAN: Fine-Grained Image Generation through Asymmetric Training , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[19]  Dimitris N. Metaxas,et al.  StackGAN: Text to Photo-Realistic Image Synthesis with Stacked Generative Adversarial Networks , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[20]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[21]  Kilian Q. Weinberger,et al.  An empirical study on evaluation metrics of generative adversarial networks , 2018, ArXiv.

[22]  Lifang Wu,et al.  Deep key frame extraction for sport training , 2019, Neurocomputing.

[23]  Peter V. Gehler,et al.  A Generative Model of People in Clothing , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[24]  Alex Graves,et al.  Conditional Image Generation with PixelCNN Decoders , 2016, NIPS.

[25]  Bing Liu,et al.  Multiobjective ResNet pruning by means of EMOAs for remote sensing scene classification , 2020, Neurocomputing.

[26]  Ying Chen,et al.  Diverse sample generation with multi-branch conditional generative adversarial network for remote sensing objects detection , 2020, Neurocomputing.

[27]  Yaser Sheikh,et al.  OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Bing Liu,et al.  Siamese Convolutional Neural Networks for Remote Sensing Scene Classification , 2019, IEEE Geoscience and Remote Sensing Letters.

[29]  Pieter Abbeel,et al.  InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets , 2016, NIPS.

[30]  Yao Li,et al.  ContourGAN: Image contour detection with generative adversarial network , 2019, Knowl. Based Syst..

[31]  Nicu Sebe,et al.  Deformable GANs for Pose-Based Human Image Generation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[32]  David Vázquez,et al.  PixelVAE: A Latent Variable Model for Natural Images , 2016, ICLR.

[33]  Wojciech Zaremba,et al.  Improved Techniques for Training GANs , 2016, NIPS.

[34]  Xiaogang Wang,et al.  DeepFashion: Powering Robust Clothes Recognition and Retrieval with Rich Annotations , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Léon Bottou,et al.  Wasserstein GAN , 2017, ArXiv.

[36]  Bernhard Schölkopf,et al.  Minimax Estimation of Maximum Mean Discrepancy with Radial Kernels , 2016, NIPS.

[37]  Zhedong Zheng,et al.  Joint Discriminative and Generative Learning for Person Re-Identification , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Luc Van Gool,et al.  Pose Guided Person Image Generation , 2017, NIPS.

[39]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[40]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[41]  Hongwei Liu,et al.  Variational probabilistic generative framework for single image super-resolution , 2019, Signal Process..

[42]  Björn Ommer,et al.  CliqueCNN: Deep Unsupervised Exemplar Learning , 2016, NIPS.

[43]  Takashi Matsubara,et al.  Image generation using generative adversarial networks and attention mechanism , 2016, 2016 IEEE/ACIS 15th International Conference on Computer and Information Science (ICIS).

[44]  Chuhan Wu,et al.  Semi-supervised dimensional sentiment analysis with variational autoencoder , 2019, Knowl. Based Syst..

[45]  Vladlen Koltun,et al.  Photographic Image Synthesis with Cascaded Refinement Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[46]  Björn Ommer,et al.  Deep Unsupervised Similarity Learning Using Partially Ordered Sets , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).