FashionTex: Controllable Virtual Try-on with Text and Texture

Virtual try-on attracts increasing research attention as a promising way for enhancing the user experience for online cloth shopping. Though existing methods can generate impressive results, users need to provide a well-designed reference image containing the target fashion clothes that often do not exist. To support user-friendly fashion customization in full-body portraits, we propose a multi-modal interactive setting by combining the advantages of both text and texture for multi-level fashion manipulation. With the carefully designed fashion editing module and loss functions, FashionTex framework can semantically control cloth types and local texture patterns without annotated pairwise training data. We further introduce an ID recovery module to maintain the identity of input portrait. Extensive experiments have demonstrated the effectiveness of our proposed pipeline. Code for this paper are at https://github.com/picksh/FashionTex.

[1]  J. Choo,et al.  High-Resolution Virtual Try-On with Misalignment and Occlusion-Handled Conditions , 2022, ECCV.

[2]  Chen Change Loy,et al.  Text2Human , 2022, ACM Trans. Graph..

[3]  Omkar M. Parkhi,et al.  End-to-End Visual Editing with a Generatively Pre-Trained Artist , 2022, ECCV.

[4]  Chen Change Loy,et al.  StyleGAN-Human: A Data-Centric Odyssey of Human Generation , 2022, ECCV.

[5]  Prafulla Dhariwal,et al.  Hierarchical Text-Conditional Image Generation with CLIP Latents , 2022, ArXiv.

[6]  Krishna Kumar Singh,et al.  InsetGAN for Full-Body Image Generation , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Holger Schwenk,et al.  FlexIT: Towards Flexible Semantic Image Translation , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Daniel Cohen-Or,et al.  Stitch it in Time: GAN-Based Facial Editing of Real Videos , 2022, SIGGRAPH Asia.

[9]  B. Ommer,et al.  High-Resolution Image Synthesis with Latent Diffusion Models , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Lu Yuan,et al.  HairCLIP: Design Your Hair by Text and Reference Image , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Lei Zhao,et al.  Texture Reformer: Towards Fast and Universal Interactive Texture Transfer , 2021, AAAI.

[12]  Jiwen Lu,et al.  DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Jong-Chul Ye,et al.  CLIPstyler: Image Style Transfer with a Single Text Condition , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Eli Shechtman,et al.  Pose with style , 2021, ACM Trans. Graph..

[15]  Jérémie Mary,et al.  EdiBERT, a generative model for image editing , 2021, Trans. Mach. Learn. Res..

[16]  L. Gool,et al.  Predict, Prevent, and Evaluate: Disentangled Text-Driven Image Manipulation Empowered by Pre-Trained Vision-Language Model , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Michael Kampffmeyer,et al.  Towards Scalable Unpaired Virtual Try-On via Patch-Routed Spatially-Adaptive GAN , 2021, NeurIPS.

[18]  Jing Yu Koh,et al.  Vector-quantized Image Modeling with Improved VQGAN , 2021, ICLR.

[19]  Jong-Chul Ye,et al.  DiffusionCLIP: Text-Guided Diffusion Models for Robust Image Manipulation , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Eli Shechtman,et al.  Pose with style , 2021, ACM Trans. Graph..

[21]  Xiaodan Liang,et al.  WAS-VTON: Warping Architecture Search for Virtual Try-on Network , 2021, ACM Multimedia.

[22]  Daniel Cohen-Or,et al.  Pivotal Tuning for Latent-based Editing of Real Images , 2021, ACM Trans. Graph..

[23]  Baoyuan Wu,et al.  TediGAN: Text-Guided Diverse Face Image Generation and Manipulation , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  S. Lazebnik,et al.  Dressing in Order: Recurrent Person Image Generation for Pose Transfer, Virtual Try-on and Outfit Editing , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[25]  Daniel Cohen-Or,et al.  StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[26]  Ilya Sutskever,et al.  Learning Transferable Visual Models From Natural Language Supervision , 2021, ICML.

[27]  Christian Theobalt,et al.  Style and Pose Control for Image Synthesis of Humans from a Single Monocular View , 2021, ArXiv.

[28]  Quoc V. Le,et al.  Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision , 2021, ICML.

[29]  Daniel Cohen-Or,et al.  Designing an encoder for StyleGAN image manipulation , 2021, ACM Trans. Graph..

[30]  Christian Theobalt,et al.  Neural Re-rendering of Humans from a Single Image , 2021, ECCV.

[31]  Kathleen M. Lewis,et al.  TryOnGAN: body-aware try-on via layered interpolation , 2021, ACM Trans. Graph..

[32]  B. Ommer,et al.  Taming Transformers for High-Resolution Image Synthesis , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Nenghai Yu,et al.  Efficient Semantic Image Synthesis via Class-Adaptive Normalization , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Dani Lischinski,et al.  StyleSpace Analysis: Disentangled Controls for StyleGAN Image Generation , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Bolei Zhou,et al.  Closed-Form Factorization of Latent Semantics in GANs , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Eduard Oks,et al.  Image Based Virtual Try-On Network From Unpaired Data , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Xiaoou Tang,et al.  InterFaceGAN: Interpreting the Disentangled Face Representation Learned by GANs , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  Yuning Jiang,et al.  Controllable Person Image Synthesis With Attribute-Decomposed GAN , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Guo Li,et al.  TailorGAN: Making User-Defined Fashion Designs , 2020, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).

[40]  Tero Karras,et al.  Analyzing and Improving the Image Quality of StyleGAN , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Jo Yew Tham,et al.  Semantically Consistent Hierarchical Text to Fashion Image Synthesis with an Enhanced-Attentional Generative Adversarial Network , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[42]  Jia-Bin Huang,et al.  Guided Image-to-Image Translation With Bi-Directional Feature Transformation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[43]  Xiaohui Xie,et al.  VTNFP: An Image-Based Virtual Try-On Network With Body and Clothing Feature Preservation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[44]  Jo Yew Tham,et al.  Attribute Manipulation Generative Adversarial Networks for Fashion Images , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[45]  Taesung Park,et al.  Semantic Image Synthesis With Spatially-Adaptive Normalization , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Timo Aila,et al.  A Style-Based Generator Architecture for Generative Adversarial Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Duygu Ceylan,et al.  SwapNet: Image Based Garment Transfer , 2018, ECCV.

[48]  Aykut Erdem,et al.  Language Guided Fashion Image Manipulation with Feature-wise Transformations , 2018, ArXiv.

[49]  Liang Lin,et al.  Toward Characteristic-Preserving Image-based Virtual Try-On Network , 2018, ECCV.

[50]  Stefanos Zafeiriou,et al.  ArcFace: Additive Angular Margin Loss for Deep Face Recognition , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  Alexei A. Efros,et al.  The Unreasonable Effectiveness of Deep Features as a Perceptual Metric , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[52]  Larry S. Davis,et al.  VITON: An Image-Based Virtual Try-on Network , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[53]  Jaakko Lehtinen,et al.  Progressive Growing of GANs for Improved Quality, Stability, and Variation , 2017, ICLR.

[54]  Sanja Fidler,et al.  Be Your Own Prada: Fashion Synthesis with Structural Coherence , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[55]  Sepp Hochreiter,et al.  GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[56]  Fisher Yu,et al.  TextureGAN: Controlling Deep Image Synthesis with Texture Patches , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[57]  Serge J. Belongie,et al.  Arbitrary Style Transfer in Real-Time with Adaptive Instance Normalization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[58]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[59]  Alexei A. Efros,et al.  Generative Visual Manipulation on the Natural Image Manifold , 2016, ECCV.

[60]  Xiaogang Wang,et al.  DeepFashion: Powering Robust Clothes Recognition and Retrieval with Rich Annotations , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[61]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[62]  Aaron C. Courville,et al.  Generative Adversarial Networks , 2014, 1406.2661.

[63]  Eero P. Simoncelli,et al.  A Parametric Texture Model Based on Joint Statistics of Complex Wavelet Coefficients , 2000, International Journal of Computer Vision.