Style Separation and Synthesis via Generative Adversarial Networks

Style synthesis attracts great interests recently, while few works focus on its dual problem "style separation". In this paper, we propose the Style Separation and Synthesis Generative Adversarial Network (S3-GAN) to simultaneously implement style separation and style synthesis on object photographs of specific categories. Based on the assumption that the object photographs lie on a manifold, and the contents and styles are independent, we employ S3-GAN to build mappings between the manifold and a latent vector space for separating and synthesizing the contents and styles. The S3-GAN consists of an encoder network, a generator network, and an adversarial network. The encoder network performs style separation by mapping an object photograph to a latent vector. Two halves of the latent vector represent the content and style, respectively. The generator network performs style synthesis by taking a concatenated vector as input. The concatenated vector contains the style half vector of the style target image and the content half vector of the content target image. Once obtaining the images from the generator network, an adversarial network is imposed to generate more photo-realistic images. Experiments on CelebA and UT Zappos 50K datasets demonstrate that the S3-GAN has the capacity of style separation and synthesis simultaneously, and could capture various styles in a single model.

[1]  Alexei A. Efros,et al.  Context Encoders: Feature Learning by Inpainting , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Yu Qiao,et al.  Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks , 2016, IEEE Signal Processing Letters.

[3]  Andrea Vedaldi,et al.  Texture Networks: Feed-forward Synthesis of Textures and Stylized Images , 2016, ICML.

[4]  Leon A. Gatys,et al.  Texture Synthesis Using Convolutional Neural Networks , 2015, NIPS.

[5]  Alexei A. Efros,et al.  Image quilting for texture synthesis and transfer , 2001, SIGGRAPH.

[6]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[7]  Sung Yong Shin,et al.  On pixel-based texture synthesis by non-parametric sampling , 2006, Comput. Graph..

[8]  Guo-Jun Qi,et al.  Loss-Sensitive Generative Adversarial Networks on Lipschitz Densities , 2017, International Journal of Computer Vision.

[9]  Yongdong Zhang,et al.  One-Shot Fine-Grained Instance Retrieval , 2017, ACM Multimedia.

[10]  Chuan Li,et al.  Precomputed Real-Time Texture Synthesis with Markovian Generative Adversarial Networks , 2016, ECCV.

[11]  Li Fei-Fei,et al.  Perceptual Losses for Real-Time Style Transfer and Super-Resolution , 2016, ECCV.

[12]  Dong Liu,et al.  Multi-Scale Triplet CNN for Person Re-Identification , 2016, ACM Multimedia.

[13]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[14]  Nenghai Yu,et al.  StyleBank: An Explicit Representation for Neural Image Style Transfer , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[16]  Leon A. Gatys,et al.  Image Style Transfer Using Convolutional Neural Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Kristen Grauman,et al.  Fine-Grained Visual Comparisons with Local Learning , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[19]  Nicu Sebe,et al.  Deformable GANs for Pose-Based Human Image Generation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[20]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[21]  Aaron C. Courville,et al.  Improved Training of Wasserstein GANs , 2017, NIPS.

[22]  Yann LeCun,et al.  Energy-based Generative Adversarial Network , 2016, ICLR.

[23]  Simon Osindero,et al.  Conditional Generative Adversarial Nets , 2014, ArXiv.

[24]  Xiaogang Wang,et al.  Deep Learning Face Attributes in the Wild , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[25]  Sheng Tang,et al.  Global-residual and Local-boundary Refinement Networks for Rectifying Scene Parsing Predictions , 2017, IJCAI.

[26]  Christian Ledig,et al.  Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Honglak Lee,et al.  Exploring the structure of a real-time, arbitrary neural artistic stylization network , 2017, BMVC.

[28]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Hao Li,et al.  High-Resolution Image Inpainting Using Multi-scale Neural Patch Synthesis , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Yuxin Peng,et al.  Dual Adversarial Networks for Zero-shot Cross-media Retrieval , 2018, IJCAI.

[31]  Ming-Hsuan Yang,et al.  Universal Style Transfer via Feature Transforms , 2017, NIPS.

[32]  Yongdong Zhang,et al.  Large-scale person re-identification as retrieval , 2017, 2017 IEEE International Conference on Multimedia and Expo (ICME).

[33]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  Chuan Li,et al.  Combining Markov Random Fields and Convolutional Neural Networks for Image Synthesis , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  拓海 杉山,et al.  “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .

[36]  Jonathon Shlens,et al.  A Learned Representation For Artistic Style , 2016, ICLR.

[37]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[38]  David Salesin,et al.  Image Analogies , 2001, SIGGRAPH.

[39]  Abhinav Gupta,et al.  Generative Image Modeling Using Style and Structure Adversarial Networks , 2016, ECCV.

[40]  Hyunsoo Kim,et al.  Learning to Discover Cross-Domain Relations with Generative Adversarial Networks , 2017, ICML.

[41]  Ming-Hsuan Yang,et al.  Diversified Texture Synthesis with Feed-Forward Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Dong Liu,et al.  Fully Convolutional Adaptation Networks for Semantic Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[43]  Yi Yang,et al.  GeneGAN: Learning Object Transfiguration and Attribute Subspace from Unpaired Data , 2017, BMVC 2017.

[44]  Yuxin Peng,et al.  Unsupervised Generative Adversarial Cross-modal Hashing , 2017, AAAI.

[45]  Luc Van Gool,et al.  Pose Guided Person Image Generation , 2017, NIPS.

[46]  Wei Shen,et al.  Learning Residual Images for Face Attribute Manipulation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[48]  Ming-Hsuan Yang,et al.  Learning to Adapt Structured Output Space for Semantic Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[49]  Sheng Tang,et al.  Scale-Adaptive Convolutions for Scene Parsing , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[50]  Yang Song,et al.  Age Progression/Regression by Conditional Adversarial Autoencoder , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  Serge J. Belongie,et al.  Arbitrary Style Transfer in Real-Time with Adaptive Instance Normalization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[52]  Bernt Schiele,et al.  Generative Adversarial Text to Image Synthesis , 2016, ICML.