论文信息 - Towards Multi-Pose Guided Virtual Try-On Network

Towards Multi-Pose Guided Virtual Try-On Network

Virtual try-on systems under arbitrary human poses have significant application potential, yet also raise extensive challenges, such as self-occlusions, heavy misalignment among different poses, and complex clothes textures. Existing virtual try-on methods can only transfer clothes given a fixed human pose, and still show unsatisfactory performances, often failing to preserve person identity or texture details, and with limited pose diversity. This paper makes the first attempt towards a multi-pose guided virtual try-on system, which enables clothes to transfer onto a person with diverse poses. Given an input person image, a desired clothes image, and a desired pose, the proposed Multi-pose Guided Virtual Try-On Network (MG-VTON) generates a new person image after fitting the desired clothes into the person and manipulating the pose. MG-VTON is constructed with three stages: 1) a conditional human parsing network is proposed that matches both the desired pose and the desired clothes shape; 2) a deep Warping Generative Adversarial Network (Warp-GAN) that warps the desired clothes appearance into the synthesized human parsing map and alleviates the misalignment problem between the input human pose and the desired one; 3) a refinement render network recovers the texture details of clothes and removes artifacts, based on multi-pose composition masks. Extensive experiments on commonly-used datasets and our newly-collected largest virtual try-on benchmark demonstrate that our MG-VTON significantly outperforms all state-of-the-art methods both qualitatively and quantitatively, showing promising virtual try-on performances.

[1] Wojciech Zaremba,et al. Improved Techniques for Training GANs , 2016, NIPS.

[2] Xiaogang Wang,et al. DeepFashion: Powering Robust Clothes Recognition and Retrieval with Rich Annotations , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3] Bingbing Ni,et al. Skeleton-Aided Articulated Motion Generation , 2017, ACM Multimedia.

[4] Li Fei-Fei,et al. Perceptual Losses for Real-Time Style Transfer and Super-Resolution , 2016, ECCV.

[5] Hyunsoo Kim,et al. Learning to Discover Cross-Domain Relations with Generative Adversarial Networks , 2017, ICML.

[6] Ke Gong,et al. Look into Person: Self-Supervised Structure-Sensitive Learning and a New Benchmark for Human Parsing , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7] Bernt Schiele,et al. Generative Adversarial Text to Image Synthesis , 2016, ICML.

[8] Frédo Durand,et al. Synthesizing Images of Humans in Unseen Poses , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[9] Hanjiang Lai,et al. Part-Preserving Pose Manipulation for Person Image Synthesis , 2019, 2019 IEEE International Conference on Multimedia and Expo (ICME).

[10] Yaser Sheikh,et al. OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11] Thomas Brox,et al. U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[12] Fred L. Bookstein,et al. Principal Warps: Thin-Plate Splines and the Decomposition of Deformations , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[13] Liu Wu,et al. Human Mesh Recovery From Monocular Images via a Skeleton-Disentangled Representation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[14] 拓海杉山,et al. “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .

[15] Weilin Huang,et al. ClothFlow: A Flow-Based Model for Clothed Person Generation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[16] Eero P. Simoncelli,et al. Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[17] Simon Osindero,et al. Conditional Generative Adversarial Nets , 2014, ArXiv.

[18] Nikolay Jetchev,et al. The Conditional Analogy GAN: Swapping Fashion Articles on People Images , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[19] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.

[20] Francesc Moreno-Noguer,et al. Unsupervised Person Image Synthesis in Arbitrary Poses , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[21] Larry S. Davis,et al. Compatible and Diverse Fashion Image Inpainting , 2019, ArXiv.

[22] Nicu Sebe,et al. Deformable GANs for Pose-Based Human Image Generation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[23] Christian Ledig,et al. Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24] Michael J. Black,et al. Detailed, Accurate, Human Shape Estimation from Clothed 3D Scan Sequences , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25] Larry S. Davis,et al. VITON: An Image-Based Virtual Try-on Network , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[26] Eric P. Xing,et al. Toward Controlled Generation of Text , 2017, ICML.

[27] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[28] Liang Lin,et al. Toward Characteristic-Preserving Image-based Virtual Try-On Network , 2018, ECCV.

[29] Eric P. Xing,et al. On Unifying Deep Generative Models , 2017, ICLR.

[30] Alexei A. Efros,et al. Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[31] Eric P. Xing,et al. Unsupervised Text Style Transfer using Language Models as Discriminators , 2018, NeurIPS.

[32] Peter V. Gehler,et al. A Generative Model of People in Clothing , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[33] Ping Tan,et al. DualGAN: Unsupervised Dual Learning for Image-to-Image Translation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[34] Michael J. Black,et al. ClothCap: seamless 4D clothing capture and retargeting , 2017, ACM Trans. Graph..

[35] Sanja Fidler,et al. Be Your Own Prada: Fashion Synthesis with Structural Coherence , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[36] Masashi Nishiyama,et al. Virtual Fitting by Single-Shot Body Shape Estimation , 2014 .

[37] P. Cochat,et al. Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[38] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[39] Jitendra Malik,et al. Shape matching and object recognition using shape contexts , 2010, 2010 3rd International Conference on Computer Science and Information Technology.

[40] Björn Ommer,et al. A Variational U-Net for Conditional Appearance and Shape Generation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[41] Luc Van Gool,et al. Disentangled Person Image Generation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[42] Shakir Mohamed,et al. Learning in Implicit Generative Models , 2016, ArXiv.

[43] Jan Kautz,et al. High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[44] Luc Van Gool,et al. Pose Guided Person Image Generation , 2017, NIPS.

[45] Lantao Yu,et al. SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient , 2016, AAAI.

[46] Jian Yin,et al. FW-GAN: Flow-Navigated Warping GAN for Video Virtual Try-On , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[47] Daniel Cremers,et al. DeepWrinkles: Accurate and Realistic Clothing Modeling , 2018, ECCV.

[48] Alexei A. Efros,et al. Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[49] Hao Li,et al. High-Resolution Image Inpainting Using Multi-scale Neural Patch Synthesis , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[50] Eric Xing,et al. Deep Generative Models with Learnable Knowledge Constraints , 2018, NeurIPS.

[51] Josef Sivic,et al. Convolutional Neural Network Architecture for Geometric Matching , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[52] Hanjiang Lai,et al. Soft-Gated Warping-GAN for Pose-Guided Person Image Synthesis , 2018, NeurIPS.