VOGUE: Try-On by StyleGAN Interpolation Optimization

Given an image of a target person and an image of another person wearing a garment, we automatically generate the target person in the given garment. At the core of our method is a pose-conditioned StyleGAN2 latent space interpolation, which seamlessly combines the areas of interest from each image, i.e., body shape, hair, and skin color are derived from the target person, while the garment with its folds, material properties, and shape comes from the garment image. By automatically optimizing for interpolation coefficients per layer in the latent space, we can perform a seamless, yet true to source, merging of the garment and target person. Our algorithm allows for garments to deform according to the given body shape, while preserving pattern and material details. Experiments demonstrate state-of-theart photo-realistic results at high resolution (512× 512). 1Work done while the first author was an intern at Google Research.

[1]  Serge J. Belongie,et al.  Arbitrary Style Transfer in Real-Time with Adaptive Instance Normalization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[2]  Yuning Jiang,et al.  Controllable Person Image Synthesis With Attribute-Decomposed GAN , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Xiao Zhang,et al.  Learning Unified Embedding for Apparel Recognition , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[4]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[5]  Peter Wonka,et al.  Disentangled Image Generation Through Structured Noise Injection , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Yoshua Bengio,et al.  Feature-wise transformations , 2018, Distill.

[7]  Jitendra Malik,et al.  Shape matching and object recognition using shape contexts , 2010, 2010 3rd International Conference on Computer Science and Information Technology.

[8]  Timo Aila,et al.  A Style-Based Generator Architecture for Generative Adversarial Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Raja Bala,et al.  Editing in Style: Uncovering the Local Semantics of GANs , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[11]  Jérémie Mary,et al.  End-to-End Learning of Geometric Deformations of Feature Maps for Virtual Try-On , 2019, ArXiv.

[12]  Nipun Kwatra,et al.  S2cGAN: Semi-Supervised Training of Conditional GANs with Fewer Labels , 2020, ArXiv.

[13]  Jaakko Lehtinen,et al.  Analyzing and Improving the Image Quality of StyleGAN , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Ruimao Zhang,et al.  Towards Photo-Realistic Virtual Try-On by Adaptively Generating↔Preserving Image Content , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Neill D. F. Campbell,et al.  The GAN That Warped: Semantic Attribute Editing With Unpaired Data , 2018, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Honglak Lee,et al.  Exploring the structure of a real-time, arbitrary neural artistic stylization network , 2017, BMVC.

[17]  Duygu Ceylan,et al.  SwapNet: Garment Transfer in Single View Images , 2018, European Conference on Computer Vision.

[18]  Eduard Oks,et al.  Image Based Virtual Try-On Network From Unpaired Data , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Roland Vollgraf,et al.  Generating High-Resolution Fashion Model Images Wearing Custom Outfits , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[20]  Sam Kwong,et al.  Unsupervised Image-to-Image Translation via Pre-Trained StyleGAN2 Network , 2020, IEEE Transactions on Multimedia.

[21]  Peter Wonka,et al.  Image2StyleGAN++: How to Edit the Embedded Images? , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  King-Sun Fu,et al.  IEEE Transactions on Pattern Analysis and Machine Intelligence Publication Information , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Jonathon Shlens,et al.  A Learned Representation For Artistic Style , 2016, ICLR.

[24]  Simon Osindero,et al.  Conditional Generative Adversarial Nets , 2014, ArXiv.

[25]  Mayur Hemani,et al.  SieveNet: A Unified Framework for Robust Image-Based Virtual Try-On , 2020, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).

[26]  Jaakko Lehtinen,et al.  Progressive Growing of GANs for Improved Quality, Stability, and Variation , 2017, ICLR.

[27]  Amir Hossein Raffiee,et al.  GarmentGAN: Photo-realistic Adversarial Fashion Transfer , 2020, 2020 25th International Conference on Pattern Recognition (ICPR).

[28]  Hanjiang Lai,et al.  Towards Multi-Pose Guided Virtual Try-On Network , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[29]  Sepp Hochreiter,et al.  GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[30]  Doron Adler,et al.  Resolution Dependent GAN Interpolation for Controllable Image Synthesis Between Domains , 2020, ArXiv.

[31]  Liang Lin,et al.  Toward Characteristic-Preserving Image-based Virtual Try-On Network , 2018, ECCV.

[32]  Jan Kautz,et al.  High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[33]  Alexei A. Efros,et al.  The Unreasonable Effectiveness of Deep Features as a Perceptual Metric , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[34]  Larry S. Davis,et al.  VITON: An Image-Based Virtual Try-on Network , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[35]  Winston H. Hsu,et al.  PIVTONS: Pose Invariant Virtual Try-On Shoe with Conditional Image Completion , 2018, ACCV.