3D Information Guided Motion Transfer via Sequential Image Based Human Model Refinement and Face-Attention GAN

Image and video based human motions can be regarded as the deformation processes of person appearances, so motion transfer is usually treated as a pose guided image generation task and implemented in the 2D image plane. However, the 2D plane image generation lacks guidance of the original 3D motion information, which results in blur and shape distortions of the generated motion images. Therefore, we propose to simulate the generation process of real motion images by projecting the 3D human models, which are reconstructed from the training motion images and driven with target poses, into the 2D plane. We then take the 2D projections as the pose representations and input them into the generation model as they naturally inherit the 3D information from the original motions. Considering the unreliability on the invisible surface of the single image based human model reconstruction, we propose a sequential image based human model refinement module which exploits the complementary information between adjacent motion frames to refine the 3D human model. Furthermore, we propose a face-attention GAN model to conduct the final motion transfer, in which we use the Gaussian distribution to match the elliptical face region and design a face enhancement loss function since the faces in the generated motion images influence the performances very much. The generated motion images with reliable depth information, accurate shapes and clear faces demonstrate the effectiveness of the proposed method.

[1]  Guangtao Zhai,et al.  Poxture: Human Posture Imitation Using Neural Texture , 2022, IEEE Transactions on Circuits and Systems for Video Technology.

[2]  Lingxiao Yang,et al.  Lightweight Texture Correlation Network for Pose Guided Person Image Generation , 2022, IEEE Transactions on Circuits and Systems for Video Technology.

[3]  Thomas H. Li,et al.  Neural Texture Extraction and Distribution for Controllable Person Image Synthesis , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Lingxiao Yang,et al.  Exploring Dual-task Correlation for Pose Guided Person Image Generation , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Qingshan Liu,et al.  Pose-Driven Realistic 2-D Motion Synthesis , 2021, IEEE Transactions on Cybernetics.

[6]  Siheng Chen,et al.  A 3D Mesh-Based Lifting-and-Projection Network for Human Pose Transfer , 2021, IEEE Transactions on Multimedia.

[7]  Vitali Kaiser,et al.  Pose-Guided Person Image Synthesis for Data Augmentation in Pedestrian Detection , 2021, 2021 IEEE Intelligent Vehicles Symposium (IV).

[8]  Yu-Kun Lai,et al.  Robust Pose Transfer With Dynamic Details Using Neural Video Rendering , 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Sergey Tulyakov,et al.  Flow Guided Transformable Bottleneck Networks for Motion Retargeting , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Xintong Han,et al.  Few-Shot Human Motion Transfer by Personalized Geometry and Texture Modeling , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Xiang Bai,et al.  Progressive and Aligned Pose Attention Transfer for Person Image Generation , 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Yu-Kun Lai,et al.  PISE: Person Image Synthesis and Editing with Decoupled GAN , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Guiyu Xia,et al.  Spatial Consistency Constrained GAN for Human Motion Transfer , 2021, IEEE Transactions on Circuits and Systems for Video Technology.

[14]  Baocai Yin,et al.  PMAN: Progressive Multi-Attention Network for Human Pose Transfer , 2021, IEEE Transactions on Circuits and Systems for Video Technology.

[15]  Wenhan Luo,et al.  Liquid Warping GAN With Attention: A Unified Framework for Human Image Synthesis , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Qionghai Dai,et al.  PoNA: Pose-Guided Non-Local Attention for Human Pose Transfer , 2020, IEEE Transactions on Image Processing.

[17]  Thomas H. Li,et al.  Deep Spatial Transformation for Pose-Guided Person Image Generation and Animation , 2020, IEEE Transactions on Image Processing.

[18]  Michael J. Black,et al.  SMPLpix: Neural Avatars from 3D Human Models , 2020, 2021 IEEE Winter Conference on Applications of Computer Vision (WACV).

[19]  Wei Zhang,et al.  Unpaired Person Image Generation With Semantic Parsing Transformation , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Diego Thomas,et al.  TetraTSDF: 3D Human Reconstruction From a Single Image With a Tetrahedral Outer Shell , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Hao Li,et al.  ARCH: Animatable Reconstruction of Clothed Humans , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Hanbyul Joo,et al.  PIFuHD: Multi-Level Pixel-Aligned Implicit Function for High-Resolution 3D Human Digitization , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Gerard Pons-Moll,et al.  Implicit Functions in Feature Space for 3D Shape Reconstruction and Completion , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Michael J. Black,et al.  VIBE: Video Inference for Human Body Pose and Shape Estimation , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Ashok Veeraraghavan,et al.  Ellipsoidal path connections for time-gated rendering , 2019, ACM Trans. Graph..

[26]  Hao Li,et al.  PIFu: Pixel-Aligned Implicit Function for High-Resolution Clothed Human Digitization , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[27]  C. Theobalt,et al.  Tex2Shape: Detailed Full Human Body Geometry From a Single Image , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[28]  Miao Yu,et al.  Progressive Pose Attention Transfer for Person Image Generation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Chen Huang,et al.  Dense Intrinsic Appearance Flow for Human Pose Transfer , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Xu Chen,et al.  Unpaired Pose Guided Human Image Generation , 2019, CVPR Workshops.

[31]  Dapeng Tao,et al.  Deep Multi-View Feature Learning for Person Re-Identification , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[32]  Iasonas Kokkinos,et al.  Dense Pose Transfer , 2018, ECCV.

[33]  Alexei A. Efros,et al.  Everybody Dance Now , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[34]  In-So Kweon,et al.  CBAM: Convolutional Block Attention Module , 2018, ECCV.

[35]  Wei Wang,et al.  Multistage Adversarial Losses for Pose-Based Human Image Synthesis , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[36]  Iasonas Kokkinos,et al.  DensePose: Dense Human Pose Estimation in the Wild , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[37]  Alexei A. Efros,et al.  The Unreasonable Effectiveness of Deep Features as a Perceptual Metric , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[38]  Jitendra Malik,et al.  End-to-End Recovery of Human Shape and Pose , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[39]  Jan Kautz,et al.  High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[40]  Sepp Hochreiter,et al.  GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[41]  Yaser Sheikh,et al.  Realtime Multi-person 2D Pose Estimation Using Part Affinity Fields , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Wojciech Zaremba,et al.  Improved Techniques for Training GANs , 2016, NIPS.

[43]  Michael J. Black,et al.  SMPL: A Skinned Multi-Person Linear Model , 2023 .

[44]  Simon Osindero,et al.  Conditional Generative Adversarial Nets , 2014, ArXiv.

[45]  Aaron C. Courville,et al.  Generative Adversarial Nets , 2014, NIPS.

[46]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.