Towards Photo-Realistic Virtual Try-On by Adaptively Generating↔Preserving Image Content

Image visual try-on aims at transferring a target clothing image onto a reference person, and has become a hot topic in recent years. Prior arts usually focus on preserving the character of a clothing image (e.g. texture, logo, embroidery) when warping it to arbitrary human pose. However, it remains a big challenge to generate photo-realistic try-on images when large occlusions and human poses are presented in the reference person. To address this issue, we propose a novel visual try-on network, namely Adaptive Content Generating and Preserving Network (ACGPN). In particular, ACGPN first predicts semantic layout of the reference image that will be changed after try-on (e.g. long sleeve shirt$\rightarrow$arm, arm$\rightarrow$jacket), and then determines whether its image content needs to be generated or preserved according to the predicted semantic layout, leading to photo-realistic try-on and rich clothing details. ACGPN generally involves three major modules. First, a semantic layout generation module utilizes semantic segmentation of the reference image to progressively predict the desired semantic layout after try-on. Second, a clothes warping module warps clothing images according to the generated semantic layout, where a second-order difference constraint is introduced to stabilize the warping process during training. Third, an inpainting module for content fusion integrates all information (e.g. reference image, semantic layout, warped clothes) to adaptively produce each semantic part of human body. In comparison to the state-of-the-art methods, ACGPN can generate photo-realistic images with much better perceptual quality and richer fine-details.

[1]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[2]  Xiaohui Xie,et al.  VTNFP: An Image-Based Virtual Try-On Network With Body and Clothing Feature Preservation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[3]  Yung-Yu Chuang,et al.  Deep Virtual Try-on with Clothes Transform , 2018, ICS.

[4]  Xiaogang Wang,et al.  Unconstrained Fashion Landmark Detection via Hierarchical Recurrent Transformer Networks , 2017, ACM Multimedia.

[5]  Hanjiang Lai,et al.  Towards Multi-Pose Guided Virtual Try-On Network , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[6]  Larry S. Davis,et al.  VITON: An Image-Based Virtual Try-on Network , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[7]  Andrew Zisserman,et al.  Spatial Transformer Networks , 2015, NIPS.

[8]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[9]  Ruimao Zhang,et al.  Progressively diffused networks for semantic visual parsing , 2019, Pattern Recognit..

[10]  Alexei A. Efros,et al.  Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[11]  Michael J. Black,et al.  ClothCap , 2017, ACM Trans. Graph..

[12]  Youngjoo Jo,et al.  SC-FEGAN: Face Editing Generative Adversarial Network With User’s Sketch and Color , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[13]  Wei Xiong,et al.  Foreground-Aware Image Inpainting , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Miguel A. Otaduy,et al.  Learning‐Based Animation of Clothing for Virtual Try‐On , 2019, Comput. Graph. Forum.

[15]  Hideo Saito,et al.  Texture overlay for virtual clothing based on PCA of silhouettes , 2006, 2006 IEEE/ACM International Symposium on Mixed and Augmented Reality.

[16]  Hiroshi Ishikawa,et al.  Globally and locally consistent image completion , 2017, ACM Trans. Graph..

[17]  Nikolay Jetchev,et al.  The Conditional Analogy GAN: Swapping Fashion Articles on People Images , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[18]  Michael J. Black,et al.  DRAPE , 2012, ACM Trans. Graph..

[19]  Yu Liu,et al.  SwapGAN: A Multistage Generative Approach for Person-to-Person Fashion Style Transfer , 2019, IEEE Transactions on Multimedia.

[20]  Tomoharu Iwata,et al.  Fashion Coordinates Recommender System Using Photographs from Fashion Magazines , 2011, IJCAI.

[21]  Ziqi Zhang,et al.  Fashion Editing with Multi-scale Attention Normalization , 2019, ArXiv.

[22]  Jan Kautz,et al.  High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[23]  Chanho Jung,et al.  A Global-Local Embedding Module for Fashion Landmark Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[24]  Jawadul H. Bappy,et al.  Pose Guided Fashion Image Synthesis Using Deep Generative Model , 2019, ArXiv.

[25]  Stephen Lin,et al.  Image-based clothes animation for virtual fitting , 2012, SIGGRAPH Asia Technical Briefs.

[26]  Ruimao Zhang,et al.  DeepFashion2: A Versatile Benchmark for Detection, Pose Estimation, Segmentation and Re-Identification of Clothing Images , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Liang Lin,et al.  Toward Characteristic-Preserving Image-based Virtual Try-On Network , 2018, ECCV.

[28]  Isay Katsman,et al.  Fashion++: Minimal Edits for Outfit Improvement , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[29]  Lingyun Wu,et al.  MaskGAN: Towards Diverse and Interactive Facial Image Manipulation , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Alla Sheffer,et al.  Animation wrinkling: augmenting coarse cloth simulations with realistic-looking wrinkles , 2010, ACM Trans. Graph..

[31]  Jaakko Lehtinen,et al.  Progressive Growing of GANs for Improved Quality, Stability, and Variation , 2017, ICLR.

[32]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[33]  Alla Sheffer,et al.  Design preserving garment transfer , 2012, ACM Trans. Graph..

[34]  Ting-Chun Wang,et al.  Image Inpainting for Irregular Holes Using Partial Convolutions , 2018, ECCV.

[35]  Jiebo Luo,et al.  Mining Fashion Outfit Composition Using an End-to-End Deep Learning Approach on Set Data , 2016, IEEE Transactions on Multimedia.

[36]  Serge J. Belongie,et al.  Learning Visual Clothing Style with Heterogeneous Dyadic Co-Occurrences , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[37]  Wojciech Zaremba,et al.  Improved Techniques for Training GANs , 2016, NIPS.

[38]  Jung-Woo Ha,et al.  StarGAN: Unified Generative Adversarial Networks for Multi-domain Image-to-Image Translation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[39]  Taesung Park,et al.  Semantic Image Synthesis With Spatially-Adaptive Normalization , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Xiaogang Wang,et al.  Fashion Landmark Detection in the Wild , 2016, ECCV.

[41]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[42]  Thomas S. Huang,et al.  Free-Form Image Inpainting With Gated Convolution , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[43]  Duygu Ceylan,et al.  SwapNet: Image Based Garment Transfer , 2018, ECCV.

[44]  Thomas S. Huang,et al.  Generative Image Inpainting with Contextual Attention , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[45]  Hong Lu,et al.  Deep Fashion Analysis with Feature Map Upsampling and Landmark-Driven Attention , 2018, ECCV Workshops.

[46]  Jean Duchon,et al.  Splines minimizing rotation-invariant semi-norms in Sobolev spaces , 1976, Constructive Theory of Functions of Several Variables.

[47]  Timo Aila,et al.  A Style-Based Generator Architecture for Generative Adversarial Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Gerhard Reitmayr,et al.  Virtual Try-On through Image-Based Rendering , 2013, IEEE Transactions on Visualization and Computer Graphics.

[49]  Hiroshi Tanaka,et al.  Texture Overlay onto Flexible Object with PCA of Silhouettes and K-Means Method for Search into Database , 2009, MVA.

[50]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  Li Fei-Fei,et al.  Perceptual Losses for Real-Time Style Transfer and Super-Resolution , 2016, ECCV.

[52]  Larry S. Davis,et al.  FiNet: Compatible and Diverse Fashion Image Inpainting , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).