Input Reconstruction Side and top down view Part Segmentation Input Reconstruction Side and top down view Part Segmentation

We describe Human Mesh Recovery (HMR), an end-toend framework for reconstructing a full 3D mesh of a human body from a single RGB image. In contrast to most current methods that compute 2D or 3D joint locations, we produce a richer and more useful mesh representation that is parameterized by shape and 3D joint angles. The main objective is to minimize the reprojection loss of keypoints, which allows our model to be trained using in-the-wild images that only have ground truth 2D annotations. However, the reprojection loss alone is highly underconstrained. In this work we address this problem by introducing an adversary trained to tell whether human body shape and pose parameters are real or not using a large database of 3D human meshes. We show that HMR can be trained with and without using any paired 2D-to-3D supervision. We do not rely on intermediate 2D keypoint detections and infer 3D pose and shape parameters directly from image pixels. Our model runs in real-time given a bounding box containing the person. We demonstrate our approach on various images in-the-wild and out-perform previous optimizationbased methods that output 3D meshes and show competitive results on tasks such as 3D joint location estimation and part segmentation.

[1]  Ersin Yumer,et al.  Self-supervised Learning of Motion Capture , 2017, NIPS.

[2]  Cordelia Schmid,et al.  LCR-Net: Localization-Classification-Regression for Human Pose , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Katerina Fragkiadaki,et al.  Adversarial Inverse Graphics Networks: Learning 2D-to-3D Lifting and Image-to-Image Translation from Unpaired Supervision , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[4]  James J. Little,et al.  A Simple Yet Effective Baseline for 3d Human Pose Estimation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[5]  Hans-Peter Seidel,et al.  VNect , 2017, ACM Trans. Graph..

[6]  Yichen Wei,et al.  Weakly-supervised Transfer for 3D Human Pose Estimation in the Wild , 2017, ArXiv.

[7]  Yichen Wei,et al.  Compositional Human Pose Regression , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[8]  Alexei A. Efros,et al.  Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[9]  Peter V. Gehler,et al.  Unite the People: Closing the Loop Between 3D and 2D Human Representations , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Cordelia Schmid,et al.  Learning from Synthetic Humans , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Lourdes Agapito,et al.  Lifting from the Deep: Convolutional 3D Pose Estimation from a Single Image , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Iasonas Kokkinos,et al.  DenseReg: Fully Convolutional Dense Shape Regression In-the-Wild , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Pascal Fua,et al.  Monocular 3D Human Pose Estimation in the Wild Using Improved CNN Supervision , 2016, 2017 International Conference on 3D Vision (3DV).

[14]  Francesc Moreno-Noguer,et al.  3D Human Pose Estimation from a Single Image via Distance Matrix Regression , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Xiaowei Zhou,et al.  Coarse-to-Fine Volumetric Prediction for Single-Image 3D Human Pose , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Pascal Fua,et al.  Learning to Fuse 2D and 3D Image Cues for Monocular Body Pose Estimation , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[17]  Xiaowei Zhou,et al.  Sparse Representation for 3D Shape Estimation: A Convex Relaxation Approach , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Ignas Budvytis,et al.  Indirect deep structured learning for 3D human body shape and pose prediction , 2017, BMVC.

[19]  Fiora Pirri,et al.  Bayesian Image Based 3D Pose Estimation , 2016, ECCV.

[20]  Wei Zhang,et al.  Deep Kinematic Pose Regression , 2016, ECCV Workshops.

[21]  Peter V. Gehler,et al.  Keep It SMPL: Automatic Estimation of 3D Human Pose and Shape from a Single Image , 2016, ECCV.

[22]  Cordelia Schmid,et al.  MoCap-guided Data Augmentation for 3D Pose Estimation in the Wild , 2016, NIPS.

[23]  Yuandong Tian,et al.  Single Image 3D Interpreter Network , 2016, ECCV.

[24]  Jia Deng,et al.  Stacked Hourglass Networks for Human Pose Estimation , 2016, ECCV.

[25]  Jian Sun,et al.  Identity Mappings in Deep Residual Networks , 2016, ECCV.

[26]  Varun Ramakrishna,et al.  Convolutional Pose Machines , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Xiaowei Zhou,et al.  Sparseness Meets Deepness: 3D Human Pose Estimation from Monocular Video , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Jitendra Malik,et al.  Human Pose Estimation with Iterative Error Feedback , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Vincent Lepetit,et al.  Training a Feedback Loop for Hand Pose Estimation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[30]  Michael J. Black,et al.  SMPL: A Skinned Multi-Person Linear Model , 2023 .

[31]  Michael J. Black,et al.  Pose-conditioned joint angle limits for 3D human pose reconstruction , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Joshua B. Tenenbaum,et al.  Picture: A probabilistic programming language for scene perception , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[34]  Jitendra Malik,et al.  Viewpoints and keypoints , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[36]  Michael J. Black,et al.  MoSh: motion and shape capture from sparse markers , 2014, ACM Trans. Graph..

[37]  Cristian Sminchisescu,et al.  Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  Bernt Schiele,et al.  2D Human Pose Estimation: New Benchmark and State of the Art Analysis , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[39]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[40]  Christian Szegedy,et al.  DeepPose: Human Pose Estimation via Deep Neural Networks , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[41]  T. Kanade,et al.  Reconstructing 3D Human Pose from 2D Image Landmarks , 2012, ECCV.

[42]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[43]  Yu Chen,et al.  Inferring 3D Shapes and Deformations from Single Views , 2010, ECCV.

[44]  Pietro Perona,et al.  Cascaded pose regression , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[45]  Hans-Peter Seidel,et al.  Multilinear pose and body shape estimation of dressed subjects from image sets , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[46]  Mark Everingham,et al.  Clustered Pose and Nonlinear Appearance Models for Human Pose Estimation , 2010, BMVC.

[47]  Michael J. Black,et al.  HumanEva: Synchronized Video and Motion Capture Dataset and Baseline Algorithm for Evaluation of Articulated Human Motion , 2010, International Journal of Computer Vision.

[48]  Michael J. Black,et al.  Estimating human shape and pose from a single image , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[49]  R. Chellappa,et al.  View independent human body pose estimation from a single perspective image , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[50]  Ioannis A. Kakadiaris,et al.  Estimating Anthropometry and Pose from a Single Uncalibrated Image , 2001, Comput. Vis. Image Underst..

[51]  Camillo J. Taylor,et al.  Reconstruction of articulated objects from point correspondences in a single uncalibrated image , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[52]  Hsi-Jian Lee,et al.  Determination of 3D human body postures from a single view , 1985, Comput. Vis. Graph. Image Process..

[53]  J. Gower Generalized procrustes analysis , 1975 .