Unpaired Pose Guided Human Image Generation

This paper studies the task of full generative modelling of realistic images of humans, guided only by coarse sketch of the pose, while providing control over the specific instance or type of outfit worn by the user. This is a difficult problem because input and output domain are very different and direct image-to-image translation becomes infeasible. We propose an end-to-end trainable network under the generative adversarial framework, that provides detailed control over the final appearance while not requiring paired training data and hence allows us to forgo the challenging problem of fitting 3D poses to 2D images. The model allows to generate novel samples conditioned on either an image taken from the target domain or a class label indicating the style of clothing (e.g., t-shirt). We thoroughly evaluate the architecture and the contributions of the individual components experimentally. Finally, we show in a large scale perceptual study that our approach can generate realistic looking images and that participants struggle in detecting fake images versus real samples, especially if faces are blurred.

[1]  Cristian Sminchisescu,et al.  Human Appearance Transfer , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[2]  Alexei A. Efros,et al.  Everybody Dance Now , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[3]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Frédo Durand,et al.  Synthesizing Images of Humans in Unseen Poses , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[5]  Wei Wang,et al.  Multistage Adversarial Losses for Pose-Based Human Image Synthesis , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[6]  Philip H. S. Torr,et al.  A Conditional Deep Generative Model of People in Natural Images , 2019, 2019 IEEE Winter Conference on Applications of Computer Vision (WACV).

[7]  Francesc Moreno-Noguer,et al.  Unsupervised Person Image Synthesis in Arbitrary Poses , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[8]  Zhenhua Wang,et al.  Synthesizing Training Images for Boosting Human 3D Pose Estimation , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[9]  Hyunsoo Kim,et al.  Learning to Discover Cross-Domain Relations with Generative Adversarial Networks , 2017, ICML.

[10]  Nicu Sebe,et al.  Deformable GANs for Pose-Based Human Image Generation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[11]  Yu-Ding Lu,et al.  DRIT++: Diverse Image-to-Image Translation via Disentangled Representations , 2020, International Journal of Computer Vision.

[12]  Luc Van Gool,et al.  Pose Guided Person Image Generation , 2017, NIPS.

[13]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[14]  James F. O'Brien,et al.  Adaptive anisotropic remeshing for cloth simulation , 2012, ACM Trans. Graph..

[15]  Max Welling,et al.  Semi-supervised Learning with Deep Generative Models , 2014, NIPS.

[16]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[17]  Joshua B. Tenenbaum,et al.  Deep Convolutional Inverse Graphics Network , 2015, NIPS.

[18]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[19]  Yaser Sheikh,et al.  PixelNN: Example-based Image Synthesis , 2017, ICLR.

[20]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[21]  Honglak Lee,et al.  Attribute2Image: Conditional Image Generation from Visual Attributes , 2015, ECCV.

[22]  Wojciech Zaremba,et al.  Improved Techniques for Training GANs , 2016, NIPS.

[23]  Alexei A. Efros,et al.  Toward Multimodal Image-to-Image Translation , 2017, NIPS.

[24]  Otmar Hilliges,et al.  Guiding InfoGAN with Semi-supervision , 2017, ECML/PKDD.

[25]  Sanja Fidler,et al.  Be Your Own Prada: Fashion Synthesis with Structural Coherence , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[26]  Peter V. Gehler,et al.  A Generative Model of People in Clothing , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[27]  Ping Tan,et al.  DualGAN: Unsupervised Dual Learning for Image-to-Image Translation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[28]  Alexei A. Efros,et al.  Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[29]  Philip Bachman,et al.  Augmented CycleGAN: Learning Many-to-Many Mappings from Unpaired Data , 2018, ICML.

[30]  Hanjiang Lai,et al.  Soft-Gated Warping-GAN for Pose-Guided Person Image Synthesis , 2018, NeurIPS.

[31]  Vladlen Koltun,et al.  Photographic Image Synthesis with Cascaded Refinement Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[32]  Bernt Schiele,et al.  Articulated people detection and pose estimation: Reshaping the future , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[33]  Jan Kautz,et al.  Multimodal Unsupervised Image-to-Image Translation , 2018, ECCV.

[34]  Yi Yang,et al.  Unlabeled Samples Generated by GAN Improve the Person Re-identification Baseline in Vitro , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[35]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Duygu Ceylan,et al.  SwapNet: Garment Transfer in Single View Images , 2018, European Conference on Computer Vision.

[37]  Larry S. Davis,et al.  VITON: An Image-Based Virtual Try-on Network , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[38]  Liang Lin,et al.  Toward Characteristic-Preserving Image-based Virtual Try-On Network , 2018, ECCV.

[39]  Björn Ommer,et al.  A Variational U-Net for Conditional Appearance and Shape Generation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[40]  Maneesh Kumar Singh,et al.  DRIT++: Diverse Image-to-Image Translation via Disentangled Representations , 2019, International Journal of Computer Vision.

[41]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Xiaogang Wang,et al.  Joint Detection and Identification Feature Learning for Person Search , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[44]  Iasonas Kokkinos,et al.  Dense Pose Transfer , 2018, ECCV.

[45]  Michael J. Black,et al.  A 2D Human Body Model Dressed in Eigen Clothing , 2010, ECCV.

[46]  Raymond Y. K. Lau,et al.  Least Squares Generative Adversarial Networks , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[47]  Raanan Fattal,et al.  Efficient simulation of inextensible cloth , 2007, SIGGRAPH 2007.