3D Human Body Reconstruction from a Single Image via Volumetric Regression

This paper proposes the use of an end-to-end Convolutional Neural Network for direct reconstruction of the 3D geometry of humans via volumetric regression. The proposed method does not require the fitting of a shape model and can be trained to work from a variety of input types, whether it be landmarks, images or segmentation masks. Additionally, non-visible parts, either self-occluded or otherwise, are still reconstructed, which is not the case with depth map regression. We present results that show that our method can handle both pose variation and detailed reconstruction given appropriate datasets for training.

[1]  Zhenhua Wang,et al.  Synthesizing Training Images for Boosting Human 3D Pose Estimation , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[2]  Cristian Sminchisescu,et al.  Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Cristian Sminchisescu,et al.  Latent structured models for human pose estimation , 2011, 2011 International Conference on Computer Vision.

[4]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Michael J. Black,et al.  Detailed Human Shape and Pose from Images , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Michael J. Black,et al.  Estimating human shape and pose from a single image , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[7]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[8]  Nojun Kwak,et al.  3D Human Pose Estimation Using Convolutional Neural Networks with 2D Pose Information , 2016, ECCV Workshops.

[9]  Peter V. Gehler,et al.  Keep It SMPL: Automatic Estimation of 3D Human Pose and Shape from a Single Image , 2016, ECCV.

[10]  Sudeep Sarkar,et al.  Learning Camera Viewpoint Using CNN to Improve 3D Body Pose Estimation , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[11]  T. Kanade,et al.  Reconstructing 3D Human Pose from 2D Image Landmarks , 2012, ECCV.

[12]  Vincent Lepetit,et al.  Structured Prediction of 3D Human Pose with Deep Neural Networks , 2016, BMVC.

[13]  Antoni B. Chan,et al.  3D Human Pose Estimation from Monocular Images with Deep Convolutional Neural Network , 2014, ACCV.

[14]  Cordelia Schmid,et al.  Learning from Synthetic Humans , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Vincent Lepetit,et al.  Direct Prediction of 3D Body Poses from Motion Compensated Sequences , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Michael J. Black,et al.  SMPL: A Skinned Multi-Person Linear Model , 2023 .

[17]  Michael J. Black,et al.  Combined discriminative and generative articulated pose and non-rigid shape estimation , 2007, NIPS.

[18]  Sebastian Thrun,et al.  SCAPE: shape completion and animation of people , 2005, SIGGRAPH '05.

[19]  Xiaowei Zhou,et al.  Sparseness Meets Deepness: 3D Human Pose Estimation from Monocular Video , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Xiaowei Zhou,et al.  Coarse-to-Fine Volumetric Prediction for Single-Image 3D Human Pose , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Hao Jiang 3D Human Pose Reconstruction Using Millions of Exemplars , 2010, 2010 20th International Conference on Pattern Recognition.

[22]  Georgios Tzimiropoulos,et al.  Human Pose Estimation via Convolutional Part Heatmap Regression , 2016, ECCV.

[23]  Reinhard Koch,et al.  Human Model Fitting from Monocular Posture Images , 2005 .

[24]  Tao Mei,et al.  Learning Spatio-Temporal Representation with Pseudo-3D Residual Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[25]  Yaser Sheikh,et al.  Total Capture: A 3D Deformation Model for Tracking Faces, Hands, and Bodies , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[26]  Yu Chen,et al.  Inferring 3D Shapes and Deformations from Single Views , 2010, ECCV.

[27]  Hans-Peter Seidel,et al.  VNect , 2017, ACM Trans. Graph..

[28]  Michael J. Black,et al.  Dyna: a model of dynamic human shape in motion , 2015, ACM Trans. Graph..

[29]  Georgios Tzimiropoulos,et al.  Binarized Convolutional Landmark Localizers for Human Pose Estimation and Face Alignment with Limited Resources , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[30]  William E. Lorensen,et al.  Marching cubes: A high resolution 3D surface construction algorithm , 1987, SIGGRAPH.

[31]  Georgios Tzimiropoulos,et al.  Large Pose 3D Face Reconstruction from a Single Image via Direct Volumetric CNN Regression , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[32]  Wei Zhang,et al.  Deep Kinematic Pose Regression , 2016, ECCV Workshops.

[33]  Cristian Sminchisescu,et al.  Monocular 3D Pose and Shape Estimation of Multiple People in Natural Scenes: The Importance of Multiple Scene Constraints , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.