Pose-Guided Person Image Synthesis in the Non-Iconic Views

Generating realistic images with the guidance of reference images and human poses is challenging. Despite the success of previous works on synthesizing person images in the iconic views, no efforts are made towards the task of pose-guided image synthesis in the non-iconic views. Particularly, we find that previous models cannot handle such a complex task, where the person images are captured in the non-iconic views by commercially-available digital cameras. To this end, we propose a new framework – Multi-branch Refinement Network (MR-Net), which utilizes several visual cues, including target person poses, foreground person body and scene images parsed. Furthermore, a novel Region of Interest (RoI) perceptual loss is proposed to optimize the MR-Net. Extensive experiments on two non-iconic datasets, Penn Action and BBC-Pose, as well as an iconic dataset – Market-1501, show the efficacy of the proposed model that can tackle the problem of pose-guided person image generation from the non-iconic views. The data, models, and codes are downloadable from https://github.com/loadder/MR-Net.

[1]  Luc Van Gool,et al.  Disentangled Person Image Generation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[2]  Li Fei-Fei,et al.  Perceptual Losses for Real-Time Style Transfer and Super-Resolution , 2016, ECCV.

[3]  拓海 杉山,et al.  “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .

[4]  Enhong Chen,et al.  Image Denoising and Inpainting with Deep Neural Networks , 2012, NIPS.

[5]  Alex Graves,et al.  Conditional Image Generation with PixelCNN Decoders , 2016, NIPS.

[6]  Huchuan Lu,et al.  Pose-Invariant Embedding for Deep Person Re-Identification , 2017, IEEE Transactions on Image Processing.

[7]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[8]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Raymond Y. K. Lau,et al.  Least Squares Generative Adversarial Networks , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[10]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Miao Yu,et al.  Progressive Pose Attention Transfer for Person Image Generation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Alexei A. Efros,et al.  Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[13]  Sepp Hochreiter,et al.  GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[14]  Andrew Zisserman,et al.  Automatic and Efficient Human Pose Estimation for Sign Language Videos , 2013, International Journal of Computer Vision.

[15]  Chi-Keung Tang,et al.  Deep Video Generation, Prediction and Completion of Human Action Sequences , 2017, ECCV.

[16]  Xiaogang Wang,et al.  DeepFashion: Powering Robust Clothes Recognition and Retrieval with Rich Annotations , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Tao Xiang,et al.  Pose-Normalized Image Generation for Person Re-identification , 2017, ECCV.

[18]  Jia-Bin Huang,et al.  Guided Image-to-Image Translation With Bi-Directional Feature Transformation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[19]  Antonio Torralba,et al.  HOGgles: Visualizing Object Detection Features , 2013, 2013 IEEE International Conference on Computer Vision.

[20]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[21]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Francesc Moreno-Noguer,et al.  Unsupervised Person Image Synthesis in Arbitrary Poses , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[23]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[24]  Luc Van Gool,et al.  Pose Guided Person Image Generation , 2017, NIPS.

[25]  Guo-Jun Qi,et al.  Loss-Sensitive Generative Adversarial Networks on Lipschitz Densities , 2017, International Journal of Computer Vision.

[26]  Abhinav Gupta,et al.  Non-local Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[27]  Léon Bottou,et al.  Wasserstein GAN , 2017, ArXiv.

[28]  Hod Lipson,et al.  Understanding Neural Networks Through Deep Visualization , 2015, ArXiv.

[29]  David Berthelot,et al.  BEGAN: Boundary Equilibrium Generative Adversarial Networks , 2017, ArXiv.

[30]  Hanjiang Lai,et al.  Soft-Gated Warping-GAN for Pose-Guided Person Image Synthesis , 2018, NeurIPS.

[31]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[32]  Chih-Yuan Yang,et al.  Single-Image Super-Resolution: A Benchmark , 2014, ECCV.

[33]  Jiawei He,et al.  Probabilistic Video Generation using Holistic Attribute Control , 2018, ECCV.

[34]  Ting-Chun Wang,et al.  Image Inpainting for Irregular Holes Using Partial Convolutions , 2018, ECCV.

[35]  Frédo Durand,et al.  Synthesizing Images of Humans in Unseen Poses , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[36]  Ole Winther,et al.  Ladder Variational Autoencoders , 2016, NIPS.

[37]  Hong-Han Shuai,et al.  FashionOn: Semantic-guided Image-based Virtual Try-on with Detailed Human and Clothing Information , 2019, ACM Multimedia.

[38]  Vladlen Koltun,et al.  Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.

[39]  Nicu Sebe,et al.  Deformable GANs for Pose-Based Human Image Generation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[40]  Xiaoou Tang,et al.  Image Super-Resolution Using Deep Convolutional Networks , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[42]  Thomas S. Huang,et al.  Generative Image Inpainting with Contextual Attention , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[43]  Feiping Nie,et al.  Person Reidentification via Multi-Feature Fusion With Adaptive Graph Learning , 2020, IEEE Transactions on Neural Networks and Learning Systems.

[44]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[45]  Alexei A. Efros,et al.  Context Encoders: Feature Learning by Inpainting , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Yaser Sheikh,et al.  OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[47]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[48]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[49]  Yu Wu,et al.  Pose-Guided Feature Alignment for Occluded Person Re-Identification , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[50]  Qi Tian,et al.  Scalable Person Re-identification: A Benchmark , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[51]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[52]  Wen-Huang Cheng,et al.  What Dress Fits Me Best?: Fashion Recommendation on the Clothing Style for Personal Body Shape , 2018, ACM Multimedia.

[53]  Liang Zheng,et al.  Learning Part-based Convolutional Features for Person Re-Identification , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[54]  Koray Kavukcuoglu,et al.  Pixel Recurrent Neural Networks , 2016, ICML.

[55]  Alexei A. Efros,et al.  The Unreasonable Effectiveness of Deep Features as a Perceptual Metric , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[56]  Weiyu Zhang,et al.  From Actemes to Action: A Strongly-Supervised Representation for Detailed Action Understanding , 2013, 2013 IEEE International Conference on Computer Vision.