论文信息 - WAS-VTON: Warping Architecture Search for Virtual Try-on Network

WAS-VTON: Warping Architecture Search for Virtual Try-on Network

Despite recent progress on image-based virtual try-on, current methods are constraint by shared warping networks and thus fail to synthesize natural try-on results when faced with clothing categories that require different warping operations. In this paper, we address this problem by finding clothing category-specific warping networks for the virtual try-on task via Neural Architecture Search (NAS). We introduce a NAS-Warping Module and elaborately design a bilevel hierarchical search space to identify the optimal network-level and operation-level flow estimation architecture. Given the network-level search space, containing different numbers of warping blocks, and the operation-level search space with different convolution operations, we jointly learn a combination of repeatable warping cells and convolution operations specifically for the clothing-person alignment. Moreover, a NAS-Fusion Module is proposed to synthesize more natural final try-on results, which is realized by leveraging particular skip connections to produce better-fused features that are required for seamlessly fusing the warped clothing and the unchanged person part. We adopt an efficient and stable one-shot searching strategy to search the above two modules. Extensive experiments demonstrate that our WAS-VTON significantly outperforms the previous fixed-architecture try-on methods with more natural warping results and virtual try-on results.

[1] Hanjiang Lai,et al. Towards Multi-Pose Guided Virtual Try-On Network , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[2] Ke Gong,et al. Look into Person: Self-Supervised Structure-Sensitive Learning and a New Benchmark for Human Parsing , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3] Tieniu Tan,et al. Efficient Neural Architecture Transformation Searchin Channel-Level for Object Detection , 2019, NeurIPS.

[4] Meng Wang,et al. Graphonomy: Universal Human Parsing via Graph Transfer Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5] Josef Sivic,et al. Convolutional Neural Network Architecture for Geometric Matching , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6] Mark Meyer,et al. Subspace clothing simulation using adaptive bases , 2014, ACM Trans. Graph..

[7] Eero P. Simoncelli,et al. Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[8] Yiming Yang,et al. DARTS: Differentiable Architecture Search , 2018, ICLR.

[9] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[10] Simon Osindero,et al. Conditional Generative Adversarial Nets , 2014, ArXiv.

[11] Zhen Wang,et al. On the Effectiveness of Least Squares Generative Adversarial Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12] Ronald Fedkiw,et al. Ieee Transactions on Visualization and Computer Graphics 1 Robust High-resolution Cloth Using Parallelism, History-based Collisions and Accurate Friction , 2022 .

[13] Ronald Fedkiw,et al. Robust treatment of collisions, contact and friction for cloth animation , 2002, SIGGRAPH Courses.

[14] Pradnya A. Vikhar,et al. Evolutionary algorithms: A critical review and its future prospects , 2016, 2016 International Conference on Global Trends in Signal Processing, Information Computing and Communication (ICGTSPICC).

[15] Quoc V. Le,et al. EfficientDet: Scalable and Efficient Object Detection , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16] Cl'ement Calauzenes,et al. Do Not Mask What You Do Not Need to Mask: a Parser-Free Virtual Try-On , 2020, ECCV.

[17] Ronald Fedkiw,et al. Simulation of clothing with folds and wrinkles , 2003, SCA '03.

[18] Liang Lin,et al. Toward Characteristic-Preserving Image-based Virtual Try-On Network , 2018, ECCV.

[19] Fred L. Bookstein,et al. Principal Warps: Thin-Plate Splines and the Decomposition of Deformations , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[20] Sepp Hochreiter,et al. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[21] Horst Bischof,et al. A Duality Based Approach for Realtime TV-L1 Optical Flow , 2007, DAGM-Symposium.

[22] Xiangyu Zhang,et al. Single Path One-Shot Neural Architecture Search with Uniform Sampling , 2019, ECCV.

[23] Li Fei-Fei,et al. Auto-DeepLab: Hierarchical Neural Architecture Search for Semantic Image Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24] Tao Mei,et al. Customizable Architecture Search for Semantic Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25] Paul L. Rosin,et al. CP-VTON+: Clothing Shape and Texture Preserving Image-Based Virtual Try-On , 2020 .

[26] Xiaopeng Zhang,et al. PC-DARTS: Partial Channel Connections for Memory-Efficient Architecture Search , 2020, ICLR.

[27] Quoc V. Le,et al. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.

[28] Xiaojun Chang,et al. Block-Wisely Supervised Neural Architecture Search With Knowledge Distillation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29] Wei Zhang,et al. SP-NAS: Serial-to-Parallel Backbone Search for Object Detection , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30] Eduard Oks,et al. Image Based Virtual Try-On Network From Unpaired Data , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31] Sebastian Thrun,et al. Video-based reconstruction of animatable human characters , 2010, ACM Trans. Graph..

[32] Ruimao Zhang,et al. Towards Photo-Realistic Virtual Try-On by Adaptively Generating↔Preserving Image Content , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[33] Vijay Vasudevan,et al. Learning Transferable Architectures for Scalable Image Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[34] Weilin Huang,et al. ClothFlow: A Flow-Based Model for Clothed Person Generation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[35] Rita Cucchiara,et al. VITON-GT: An Image-based Virtual Try-On Model with Geometric Transformations , 2021, 2020 25th International Conference on Pattern Recognition (ICPR).

[36] Chen Gao,et al. AdversarialNAS: Adversarial Neural Architecture Search for GANs , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[37] Quoc V. Le,et al. Understanding and Simplifying One-Shot Architecture Search , 2018, ICML.

[38] Michael J. Black,et al. DRAPE: DRessing Any PErson , 2012, ACM Trans. Graph..

[39] Alexei A. Efros,et al. Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40] Jan Kautz,et al. High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[41] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[42] Alan L. Yuille,et al. Genetic CNN , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[43] Shiyu Chang,et al. AutoGAN: Neural Architecture Search for Generative Adversarial Networks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[44] Xiaohui Xie,et al. VTNFP: An Image-Based Virtual Try-On Network With Body and Clothing Feature Preservation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[45] Larry S. Davis,et al. VITON: An Image-Based Virtual Try-on Network , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[46] Hao Chen,et al. NAS-FCOS: Fast Neural Architecture Search for Object Detection , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[47] Frank Hutter,et al. Neural Architecture Search: A Survey , 2018, J. Mach. Learn. Res..

[48] Sebastian Thrun,et al. Video-based reconstruction of animatable human characters , 2010, SIGGRAPH 2010.

[49] Christian Theobalt,et al. Multi-Garment Net: Learning to Dress 3D People From Images , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[50] Daniel Cremers,et al. DeepWrinkles: Accurate and Realistic Clothing Modeling , 2018, ECCV.

[51] Quoc V. Le,et al. NAS-FPN: Learning Scalable Feature Pyramid Architecture for Object Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[52] Yaser Sheikh,et al. OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.