论文信息 - Pose Guided Person Image Generation

Pose Guided Person Image Generation

This paper proposes the novel Pose Guided Person Generation Network (PG$^2$) that allows to synthesize person images in arbitrary poses, based on an image of that person and a novel pose. Our generation framework PG$^2$ utilizes the pose information explicitly and consists of two key stages: pose integration and image refinement. In the first stage the condition image and the target pose are fed into a U-Net-like network to generate an initial but coarse image of the person with the target pose. The second stage then refines the initial and blurry result by training a U-Net-like generator in an adversarial way. Extensive experimental results on both 128$\times$64 re-identification images and 256$\times$256 fashion photos show that our model generates high-quality person images with convincing details.

[1] Scott E. Reed,et al. Weakly-supervised Disentangling with Recurrent Transformations for 3D View Synthesis , 2015, NIPS.

[2] Li Fei-Fei,et al. Perceptual Losses for Real-Time Style Transfer and Super-Resolution , 2016, ECCV.

[3] Pieter Abbeel,et al. InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets , 2016, NIPS.

[4] Simon Osindero,et al. Conditional Generative Adversarial Nets , 2014, ArXiv.

[5] Daniel Rueckert,et al. Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6] Bo Zhao,et al. Multi-View Image Generation from a Single-View , 2017, ACM Multimedia.

[7] Wojciech Zaremba,et al. Improved Techniques for Training GANs , 2016, NIPS.

[8] Xiaogang Wang,et al. DeepFashion: Powering Robust Clothes Recognition and Retrieval with Rich Annotations , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9] Kristen Grauman,et al. Inferring Unseen Views of People , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[10] Hugo Larochelle,et al. The Neural Autoregressive Distribution Estimator , 2011, AISTATS.

[11] Honglak Lee,et al. Attribute2Image: Conditional Image Generation from Visual Attributes , 2015, ECCV.

[12] Xu Jia,et al. Towards Automatic Image Editing: Learning to See another You , 2016, BMVC.

[13] Ran He,et al. Beyond Face Rotation: Global and Local Perception GAN for Photorealistic and Identity Preserving Frontal View Synthesis , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[14] Soumith Chintala,et al. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[15] Jitendra Malik,et al. Human Pose Estimation with Iterative Error Feedback , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16] Bernt Schiele,et al. Learning What and Where to Draw , 2016, NIPS.

[17] Peter V. Gehler,et al. A Generative Model of People in Clothing , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[18] Alex Graves,et al. Conditional Image Generation with PixelCNN Decoders , 2016, NIPS.

[19] Jia Deng,et al. Stacked Hourglass Networks for Human Pose Estimation , 2016, ECCV.

[20] Du-Sik Park,et al. Rotating your face using multi-task deep neural network , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21] Yaser Sheikh,et al. OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23] Bernt Schiele,et al. Generative Adversarial Text to Image Synthesis , 2016, ICML.

[24] Nando de Freitas,et al. Generating Interpretable Images with Controllable Structure , 2017 .

[25] Koray Kavukcuoglu,et al. Pixel Recurrent Neural Networks , 2016, ICML.

[26] Alexei A. Efros,et al. Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27] Won-Ki Jeong,et al. FusionNet: A Deep Fully Residual Convolutional Neural Network for Image Segmentation in Connectomics , 2016, Frontiers in Computer Science.

[28] Qi Tian,et al. Scalable Person Re-identification: A Benchmark , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[29] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[30] Daan Wierstra,et al. Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[31] Hugo Larochelle,et al. Neural Autoregressive Distribution Estimation , 2016, J. Mach. Learn. Res..

[32] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.

[33] John E. Hopcroft,et al. Stacked Generative Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34] Dimitris N. Metaxas,et al. StackGAN: Text to Photo-Realistic Image Synthesis with Stacked Generative Adversarial Networks , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[35] Eero P. Simoncelli,et al. Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[36] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.

[37] Yann LeCun,et al. Deep multi-scale video prediction beyond mean square error , 2015, ICLR.