论文信息 - VITON: An Image-Based Virtual Try-on Network

VITON: An Image-Based Virtual Try-on Network

We present an image-based VIirtual Try-On Network (VITON) without using 3D information in any form, which seamlessly transfers a desired clothing item onto the corresponding region of a person using a coarse-to-fine strategy. Conditioned upon a new clothing-agnostic yet descriptive person representation, our framework first generates a coarse synthesized image with the target clothing item overlaid on that same person in the same pose. We further enhance the initial blurry clothing area with a refinement network. The network is trained to learn how much detail to utilize from the target clothing item, and where to apply to the person in order to synthesize a photo-realistic image in which the target item deforms naturally with clear visual patterns. Experiments on our newly collected Zalando dataset demonstrate its promise in the image-based virtual try-on task over state-of-the-art generative models.1

[1] Rainer Stiefelhagen,et al. Fashion Forward: Forecasting Visual Style in Fashion , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[2] Peter V. Gehler,et al. Keep It SMPL: Automatic Estimation of 3D Human Pose and Shape from a Single Image , 2016, ECCV.

[3] Dimitris N. Metaxas,et al. StackGAN: Text to Photo-Realistic Image Synthesis with Stacked Generative Adversarial Networks , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[4] Changsheng Xu,et al. Street-to-shop: Cross-scenario clothing retrieval via parts alignment and auxiliary set , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[5] Christian Ledig,et al. Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.

[7] Hanqing Lu,et al. Street-to-shop: Cross-scenario clothing retrieval via parts alignment and auxiliary set , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[8] Luc Van Gool,et al. Pose Guided Person Image Generation , 2017, NIPS.

[9] Wei Shen,et al. Learning Residual Images for Face Attribute Manipulation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10] Masashi Nishiyama,et al. Virtual Fitting by Single-Shot Body Shape Estimation , 2014 .

[11] Larry S. Davis,et al. Collaborative Fashion Recommendation: A Functional Tensor Factorization Approach , 2015, ACM Multimedia.

[12] Thomas Brox,et al. U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[13] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[14] Jonathon Shlens,et al. Conditional Image Synthesis with Auxiliary Classifier GANs , 2016, ICML.

[15] Bogdan Raducanu,et al. Invertible Conditional GANs for image editing , 2016, ArXiv.

[16] Anna Hilsmann,et al. Tracking and Retexturing Cloth for Real-Time Virtual Clothing Applications , 2009, MIRAGE.

[17] Vladlen Koltun,et al. Photographic Image Synthesis with Cascaded Refinement Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[18] Wojciech Zaremba,et al. Improved Techniques for Training GANs , 2016, NIPS.

[19] Xiaogang Wang,et al. DeepFashion: Powering Robust Clothes Recognition and Retrieval with Rich Annotations , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20] Li Fei-Fei,et al. Perceptual Losses for Real-Time Style Transfer and Super-Resolution , 2016, ECCV.

[21] Serge J. Belongie,et al. Learning Visual Clothing Style with Heterogeneous Dyadic Co-Occurrences , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[22] Yu-Gang Jiang,et al. Learning Fashion Compatibility with Bidirectional LSTMs , 2017, ACM Multimedia.

[23] Jitendra Malik,et al. Shape matching and object recognition using shape contexts , 2010, 2010 3rd International Conference on Computer Science and Information Technology.

[24] Bo Zhao,et al. Memory-Augmented Attribute Manipulation Networks for Interactive Fashion Search , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25] Shuicheng Yan,et al. Clothes Co-Parsing Via Joint Image Segmentation and Labeling With Application to Clothing Retrieval , 2016, IEEE Transactions on Multimedia.

[26] Ke Gong,et al. Look into Person: Self-Supervised Structure-Sensitive Learning and a New Benchmark for Human Parsing , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27] Leon A. Gatys,et al. Image Style Transfer Using Convolutional Neural Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28] Xiangyu Zhu,et al. High-fidelity Pose and Expression Normalization for face recognition in the wild , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29] Adriana Kovashka,et al. WhittleSearch: Image search with relative attribute feedback , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[30] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.

[31] Sanja Fidler,et al. Be Your Own Prada: Fashion Synthesis with Structural Coherence , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[32] Svetlana Lazebnik,et al. Where to Buy It: Matching Street Clothing Photos in Online Shops , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[33] Bernt Schiele,et al. Generative Adversarial Text to Image Synthesis , 2016, ICML.

[34] David W. Jacobs,et al. WarpNet: Weakly Supervised Matching for Single-View Reconstruction , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35] Alexei A. Efros,et al. Generative Visual Manipulation on the Natural Image Manifold , 2016, ECCV.

[36] Nikolay Jetchev,et al. The Conditional Analogy GAN: Swapping Fashion Articles on People Images , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[37] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[38] Thomas Brox,et al. Generating Images with Perceptual Similarity Metrics based on Deep Networks , 2016, NIPS.

[39] Tamara L. Berg,et al. Paper Doll Parsing: Retrieving Similar Styles to Parse Clothing Items , 2013, 2013 IEEE International Conference on Computer Vision.

[40] Namil Kim,et al. Pixel-Level Domain Transfer , 2016, ECCV.

[41] Alexei A. Efros,et al. Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[42] Yaser Sheikh,et al. OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43] Soumith Chintala,et al. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[44] Thomas Brox,et al. Learning to generate chairs with convolutional neural networks , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45] Licheng Yu,et al. Detailed Garment Recovery from a Single-View Image , 2016, ArXiv.

[46] Alexei A. Efros,et al. Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47] Alan L. Yuille,et al. Zoom Better to See Clearer: Human and Object Parsing with Hierarchical Auto-Zoom Net , 2015, ECCV.

[48] Michael J. Black,et al. ClothCap , 2017, ACM Trans. Graph..

[49] Tal Hassner,et al. Effective face frontalization in unconstrained images , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[50] Zhe Gan,et al. StyleNet: Generating Attractive Visual Captions with Styles , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[51] Peter V. Gehler,et al. A Generative Model of People in Clothing , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[52] Larry S. Davis,et al. Automatic Spatially-Aware Fashion Concept Discovery , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[53] Michael J. Black,et al. DRAPE , 2012, ACM Trans. Graph..