DIMNet: Dense implicit function network for 3D human body reconstruction

Abstract In recent years, with the improvement of artificial intelligence technology, it has become possible to reconstruct high-precision 3D human body models based on ordinary RGB images. The current 3D human body reconstruction technology requires complex external equipment to scan all angles of the human body, which is complicated to be implemented and cannot be popularized. In order to solve this problem, this paper applies deep learning models on reconstructing 3D human body based on monocular images. First of all, this paper uses Stacked Hourglass network to perform convolution operations on monocular images collected from different views. Then Multi-Layer Perceptrons (MLPs) are used to decode the encoded high-level images. The feature codes in the two views(main and side) are fused, and the interior and exterior points are classified by the fusion features, so as to obtain the corresponding 3D occupancy field. At last, the Marching Cube algorithm is used for 3D reconstruction with a specific threshold and then we use Laplace smoothing algorithm to remove artifacts. This paper proposes a dense sampling strategy based on the important joint points of the human body, which has a certain optimization effect on the realization of high-precision 3D reconstruction. The performance of the proposed scheme has been validated on the open source datasets, MGN dataset and the THuman dataset, provided by Tsinghua University. The proposed scheme can reconstruct features such as clothing folds, color textures, and facial details,and has great potential to be applied in different applications.

[1]  Michael J. Black,et al.  SMPL: A Skinned Multi-Person Linear Model , 2023 .

[2]  Christian Theobalt,et al.  Multi-Garment Net: Learning to Dress 3D People From Images , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[3]  Richard A. Newcombe,et al.  DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Marcus A. Magnor,et al.  Detailed Human Avatars from Monocular Video , 2018, 2018 International Conference on 3D Vision (3DV).

[5]  Sameh Zarif,et al.  Single-View 3D reconstruction: A Survey of deep learning methods , 2021, Comput. Graph..

[6]  Sebastian Nowozin,et al.  Occupancy Networks: Learning 3D Reconstruction in Function Space , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Georgios Tzimiropoulos,et al.  3D Human Body Reconstruction from a Single Image via Volumetric Regression , 2018, ECCV Workshops.

[8]  Gerard Pons-Moll,et al.  360-Degree Textures of People in Clothing from a Single Image , 2019, 2019 International Conference on 3D Vision (3DV).

[9]  Yaser Sheikh,et al.  OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Vladimir Kolmogorov,et al.  "GrabCut": interactive foreground extraction using iterated graph cuts , 2004, ACM Trans. Graph..

[11]  King Ngi Ngan,et al.  3-D Reconstruction of Human Body Shape From a Single Commodity Depth Camera , 2019, IEEE Transactions on Multimedia.

[12]  Bharat Lal Bhatnagar,et al.  LoopReg: Self-supervised Learning of Implicit Surface Correspondences, Pose and Shape for 3D Human Mesh Registration , 2020, NeurIPS.

[13]  Bharat Lal Bhatnagar,et al.  Combining Implicit Function Learning and Parametric Models for 3D Human Reconstruction , 2020, ECCV.

[14]  Jan Kautz,et al.  High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[15]  Hao Zhang,et al.  Learning Implicit Fields for Generative Shape Modeling , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Marcus A. Magnor,et al.  Learning to Reconstruct People in Clothing From a Single RGB Camera , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Jitendra Malik,et al.  End-to-End Recovery of Human Shape and Pose , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[19]  Ralph R. Martin,et al.  Parametric modeling of 3D human body shape - A survey , 2017, Comput. Graph..

[20]  Petros Daras,et al.  Real-Time, Full 3-D Reconstruction of Moving Foreground Objects From Multiple Consumer Depth Cameras , 2013, IEEE Transactions on Multimedia.

[21]  Hao Li,et al.  ARCH: Animatable Reconstruction of Clothed Humans , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Tao Yu,et al.  DeepHuman: 3D Human Reconstruction From a Single Image , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[23]  Cordelia Schmid,et al.  BodyNet: Volumetric Inference of 3D Human Body Shapes , 2018, ECCV.

[24]  Hao Li,et al.  PIFu: Pixel-Aligned Implicit Function for High-Resolution Clothed Human Digitization , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[25]  Dimitrios Tzionas,et al.  Expressive Body Capture: 3D Hands, Face, and Body From a Single Image , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Marc Alexa,et al.  Laplacian mesh optimization , 2006, GRAPHITE '06.

[27]  Marcus A. Magnor,et al.  Tex2Shape: Detailed Full Human Body Geometry From a Single Image , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[28]  Hanbyul Joo,et al.  PIFuHD: Multi-Level Pixel-Aligned Implicit Function for High-Resolution 3D Human Digitization , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Gerard Pons-Moll,et al.  Implicit Functions in Feature Space for 3D Shape Reconstruction and Completion , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Sindhu B. Hegde,et al.  PIG-Net: Inception based Deep Learning Architecture for 3D Point Cloud Segmentation , 2021, Comput. Graph..

[31]  Juntao Ye,et al.  Modeling 3D human body with a smart vest , 2018, Comput. Graph..

[32]  Peter V. Gehler,et al.  Keep It SMPL: Automatic Estimation of 3D Human Pose and Shape from a Single Image , 2016, ECCV.

[33]  Alec Jacobson,et al.  Fast winding numbers for soups and clouds , 2018, ACM Trans. Graph..

[34]  Jia Deng,et al.  Stacked Hourglass Networks for Human Pose Estimation , 2016, ECCV.

[35]  Marcus A. Magnor,et al.  Video Based Reconstruction of 3D People Models , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[36]  William E. Lorensen,et al.  Marching cubes: A high resolution 3D surface construction algorithm , 1987, SIGGRAPH.