Human Action Transfer Based on 3D Model Reconstruction

We present a practical and effective method for human action transfer. Given a sequence of source action and limited target information, we aim to transfer motion from source to target. Although recent works based on GAN or VAE achieved impressive results for action transfer in 2D, there still exists a lot of problems which cannot be avoided, such as distorted and discontinuous human body shape, blurry cloth texture and so on. In this paper, we try to solve these problems in a novel 3D viewpoint. On the one hand, we design a skeleton-to-3D-mesh generator to generate the 3D model, which achieves huge improvement on appearance reconstruction. Furthermore, we add a temporal connection to improve the smoothness of the model. On the other hand, instead of directly utilizing the image in RGB space, we transform the target appearance information into UV space for further pose transformation. Specially, unlike conventional graphics render method directly projects visible pixels to UV space, our transformation is according to pixel’s semantic information. We perform experiments on Human3.6M and HumanEva-I to evaluate the performance of pose generator. Both qualitative and quantitative results show that our method outperforms methods based on generation method in 2D. Additionally, we compare our render method with graphic methods on Human3.6M and People-snapshot. The comparison results show that our render method is more robust and effective.

[1]  Xiaowei Zhou,et al.  Learning to Estimate 3D Human Pose and Shape from a Single Color Image , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[2]  Michael J. Black,et al.  Pose-conditioned joint angle limits for 3D human pose reconstruction , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  J. F. Brenner,et al.  An automated microscope for cytologic research a preliminary evaluation. , 1976, The journal of histochemistry and cytochemistry : official journal of the Histochemistry Society.

[4]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[5]  Xiaoyan Sun,et al.  MiCT: Mixed 3D/2D Convolutional Tube for Human Action Recognition , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[6]  Chris Hecker,et al.  Real-time motion retargeting to highly varied user-created morphologies , 2008, ACM Trans. Graph..

[7]  Daniel Cohen-Or,et al.  Dance to the Beat : Enhancing Dancing Performance in Video , 2018 .

[8]  Chuang Gan,et al.  End-to-End Learning of Motion Representation for Video Understanding , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[9]  Fatih Murat Porikli,et al.  One-Shot Action Localization by Learning Sequence Matching Network , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[10]  Hyeong-Seok Ko,et al.  A physically-based motion retargeting filter , 2005, TOGS.

[11]  Cristian Sminchisescu,et al.  Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Cristian Sminchisescu,et al.  Latent structured models for human pose estimation , 2011, 2011 International Conference on Computer Vision.

[13]  Cristian Sminchisescu,et al.  Human Appearance Transfer , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[14]  Marcus A. Magnor,et al.  Video Based Reconstruction of 3D People Models , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[15]  Jitendra Malik,et al.  End-to-End Recovery of Human Shape and Pose , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[16]  Michael Gleicher,et al.  Retargetting motion to new characters , 1998, SIGGRAPH.

[17]  Abhinav Gupta,et al.  What Actions are Needed for Understanding Human Actions in Videos? , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[18]  Peter V. Gehler,et al.  Keep It SMPL: Automatic Estimation of 3D Human Pose and Shape from a Single Image , 2016, ECCV.

[19]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[20]  Cristian Sminchisescu,et al.  Deep Multitask Architecture for Integrated 2D and 3D Human Sensing , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Michael J. Black,et al.  SMPL: A Skinned Multi-Person Linear Model , 2023 .

[22]  Ruben Villegas,et al.  Neural Kinematic Networks for Unsupervised Motion Retargetting , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[23]  Michael J. Black,et al.  HumanEva: Synchronized Video and Motion Capture Dataset and Baseline Algorithm for Evaluation of Articulated Human Motion , 2010, International Journal of Computer Vision.

[24]  Yinghao Huang,et al.  Towards Accurate Marker-Less Human Shape and Pose Estimation over Time , 2017, 2017 International Conference on 3D Vision (3DV).

[25]  Iasonas Kokkinos,et al.  DensePose: Dense Human Pose Estimation in the Wild , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[26]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[27]  Alexei A. Efros,et al.  Everybody Dance Now , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[28]  Björn Ommer,et al.  A Variational U-Net for Conditional Appearance and Shape Generation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[29]  Peter V. Gehler,et al.  Unite the People: Closing the Loop Between 3D and 2D Human Representations , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).