Learning Nonparametric Human Mesh Reconstruction From A Single Image Without Ground Truth Meshes

Nonparametric approaches have shown promising results on reconstructing 3D human mesh from a single monocular image. Unlike previous approaches that use a parametric human model like skinned multi-person linear model (SMPL), and attempt to regress the model parameters, nonparametric approaches relax the heavy reliance on the parametric space. However, existing nonparametric methods require ground truth meshes as their regression target for each vertex, and obtaining ground truth mesh labels is very expensive. In this paper, we propose a novel approach to learn human mesh reconstruction without any ground truth meshes. This is made possible by introducing two new terms into the loss function of a graph convolutional neural network (Graph CNN). The first term is the Laplacian prior that acts as a regularizer on the reconstructed mesh. The second term is the part segmentation loss that forces the projected region of the reconstructed mesh to match the part segmentation. Experimental results on multiple public datasets show that without using 3D ground truth meshes, the proposed approach outperforms the previous state-of-the-art approaches that require ground truth meshes for training.

[1]  Kun Zhou,et al.  Large mesh deformation using the volumetric graph Laplacian , 2005, ACM Trans. Graph..

[2]  Iasonas Kokkinos,et al.  DenseReg: Fully Convolutional Dense Shape Regression In-the-Wild , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Alec R. Rivers,et al.  3D modeling with silhouettes , 2010, ACM Trans. Graph..

[4]  Li Fei-Fei,et al.  Image Generation from Scene Graphs , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[5]  Qiang Li,et al.  End-to-End Hand Mesh Recovery From a Monocular RGB Image , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[6]  Edmond Boyer,et al.  FeaStNet: Feature-Steered Graph Convolutions for 3D Shape Analysis , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[7]  Cheng Li,et al.  Delving Deep Into Hybrid Annotations for 3D Human Recovery in the Wild , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[8]  Yaser Sheikh,et al.  Total Capture: A 3D Deformation Model for Tracking Faces, Hands, and Bodies , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[9]  Michael J. Black,et al.  Generating 3D faces using Convolutional Mesh Autoencoders , 2018, ECCV.

[10]  Sanja Fidler,et al.  Detect What You Can: Detecting and Representing Objects Using Holistic Models and Body Parts , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Alexander M. Bronstein,et al.  Deformable Shape Completion with Graph Convolutional Autoencoders , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[12]  Kostas Daniilidis,et al.  Convolutional Mesh Regression for Single-Image Human Shape Reconstruction , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Marc Alexa,et al.  Laplacian mesh optimization , 2006, GRAPHITE '06.

[14]  Tao Yu,et al.  HybridFusion: Real-Time Performance Capture Using a Single Depth Sensor and Sparse IMUs , 2018, ECCV.

[15]  Hao Li,et al.  PIFu: Pixel-Aligned Implicit Function for High-Resolution Clothed Human Digitization , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[16]  Jitendra Malik,et al.  End-to-End Recovery of Human Shape and Pose , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[17]  Bodo Rosenhahn,et al.  Supplementary Material to: Recovering Accurate 3D Human Pose in The Wild Using IMUs and a Moving Camera , 2018 .

[18]  Michael J. Black,et al.  Deep inertial poser , 2018, ACM Trans. Graph..

[19]  Sanja Fidler,et al.  3D Graph Neural Networks for RGBD Semantic Segmentation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[20]  Peter V. Gehler,et al.  Neural Body Fitting: Unifying Deep Learning and Model Based Human Pose and Shape Estimation , 2018, 2018 International Conference on 3D Vision (3DV).

[21]  Cordelia Schmid,et al.  BodyNet: Volumetric Inference of 3D Human Body Shapes , 2018, ECCV.

[22]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[23]  Gabriel Taubin,et al.  A signal processing approach to fair surface design , 1995, SIGGRAPH.

[24]  Ersin Yumer,et al.  Self-supervised Learning of Motion Capture , 2017, NIPS.

[25]  James J. Little,et al.  A Simple Yet Effective Baseline for 3d Human Pose Estimation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[26]  Jianfei Cai,et al.  3D Hand Shape and Pose Estimation from a Single RGB Image (Supplementary Material) , 2019 .

[27]  Cristian Sminchisescu,et al.  Monocular 3D Pose and Shape Estimation of Multiple People in Natural Scenes: The Importance of Multiple Scene Constraints , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[28]  Luc Van Gool,et al.  Self-Supervised 3D Hand Pose Estimation Through Training by Fitting , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Hao Li,et al.  SiCloPe: Silhouette-Based Clothed People , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Ruigang Yang,et al.  Detailed Human Shape Estimation From a Single Image by Hierarchical Mesh Deformation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Tatsuya Harada,et al.  Neural 3D Mesh Renderer , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[32]  Xiaowei Zhou,et al.  MonoCap: Monocular Human Motion Capture using a CNN Coupled with a Geometric Prior , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Kostas Daniilidis,et al.  TexturePose: Supervising Human Mesh Estimation With Texture Consistency , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[34]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[35]  Richard Szeliski,et al.  A Comparison and Evaluation of Multi-View Stereo Reconstruction Algorithms , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[36]  Jitendra Malik,et al.  Learning Category-Specific Mesh Reconstruction from Image Collections , 2018, ECCV.

[37]  Mark Meyer,et al.  Implicit fairing of irregular meshes using diffusion and curvature flow , 1999, SIGGRAPH.

[38]  Song-Chun Zhu,et al.  Learning Human-Object Interactions by Graph Parsing Neural Networks , 2018, ECCV.

[39]  Peter V. Gehler,et al.  Keep It SMPL: Automatic Estimation of 3D Human Pose and Shape from a Single Image , 2016, ECCV.

[40]  Mark Everingham,et al.  Clustered Pose and Nonlinear Appearance Models for Human Pose Estimation , 2010, BMVC.

[41]  Christian Rössl,et al.  Laplacian surface editing , 2004, SGP '04.

[42]  Andrew W. Fitzgibbon,et al.  KinectFusion: Real-time dense surface mapping and tracking , 2011, 2011 10th IEEE International Symposium on Mixed and Augmented Reality.

[43]  Jitendra Malik,et al.  Learning 3D Human Dynamics From Video , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Iasonas Kokkinos,et al.  DensePose: Dense Human Pose Estimation in the Wild , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[45]  Xiaowei Zhou,et al.  Learning to Estimate 3D Human Pose and Shape from a Single Color Image , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[46]  Zicheng Liu,et al.  Cross-Domain Complementary Learning with Synthetic Data for Multi-Person Part Segmentation , 2019, ArXiv.

[47]  Peter V. Gehler,et al.  Unite the People: Closing the Loop Between 3D and 2D Human Representations , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Charless C. Fowlkes,et al.  3D Scene Reconstruction With Multi-Layer Depth and Epipolar Transformers , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[49]  Wei Liu,et al.  Pixel2Mesh: Generating 3D Mesh Models from Single RGB Images , 2018, ECCV.

[50]  Philip S. Yu,et al.  A Comprehensive Survey on Graph Neural Networks , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[51]  Silvio Savarese,et al.  Structural-RNN: Deep Learning on Spatio-Temporal Graphs , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  Michael J. Black,et al.  SMPL: A Skinned Multi-Person Linear Model , 2023 .

[53]  Jean Ponce,et al.  Accurate, Dense, and Robust Multiview Stereopsis , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[54]  Dahua Lin,et al.  Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition , 2018, AAAI.

[55]  Joan Bruna,et al.  Few-Shot Learning with Graph Neural Networks , 2017, ICLR.

[56]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[57]  Bernt Schiele,et al.  2D Human Pose Estimation: New Benchmark and State of the Art Analysis , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[58]  Ira Kemelmacher-Shlizerman,et al.  Photo Wake-Up: 3D Character Animation From a Single Photo , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[59]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[60]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[61]  Danfei Xu,et al.  Scene Graph Generation by Iterative Message Passing , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[62]  Stefan Lee,et al.  Graph R-CNN for Scene Graph Generation , 2018, ECCV.

[63]  Cristian Sminchisescu,et al.  Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[64]  Iasonas Kokkinos,et al.  HoloPose: Holistic 3D Human Reconstruction In-The-Wild , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).