论文信息 - PeelNet: Textured 3D reconstruction of human body using single view RGB image

PeelNet: Textured 3D reconstruction of human body using single view RGB image

Reconstructing human shape and pose from a single image is a challenging problem due to issues like severe self-occlusions, clothing variations, and changes in lighting to name a few. Many applications in the entertainment industry, e-commerce, health-care (physiotherapy), and mobile-based AR/VR platforms can benefit from recovering the 3D human shape, pose, and texture. In this paper, we present PeelNet, an end-to-end generative adversarial framework to tackle the problem of textured 3D reconstruction of the human body from a single RGB image. Motivated by ray tracing for generating realistic images of a 3D scene, we tackle this problem by representing the human body as a set of peeled depth and RGB maps which are obtained by extending rays beyond the first intersection with the 3D object. This formulation allows us to handle self-occlusions efficiently. Current parametric model-based approaches fail to model loose clothing and surface-level details and are proposed for the underlying naked human body. Majority of non-parametric approaches are either computationally expensive or provide unsatisfactory results. We present a simple non-parametric solution where the peeled maps are generated from a single RGB image as input. Our proposed peeled depth maps are back-projected to 3D volume to obtain a complete 3D shape. The corresponding RGB maps provide vertex-level texture details. We compare our method against current state-of-the-art methods in 3D reconstruction and demonstrate the effectiveness of our method on BUFF and MonoPerfCap datasets.

[1] Tony Tung,et al. SIZER: A Dataset and Model for Parsing 3D Clothing and Learning Size Sensitive 3D Clothing , 2020, ECCV.

[2] Chaitanya Patel,et al. HumanMeshNet: Polygonal Mesh Recovery of Humans , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[3] Peter V. Gehler,et al. Neural Body Fitting: Unifying Deep Learning and Model Based Human Pose and Shape Estimation , 2018, 2018 International Conference on 3D Vision (3DV).

[4] Michael J. Black,et al. Detailed, Accurate, Human Shape Estimation from Clothed 3D Scan Sequences , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5] Stefan Roth,et al. Matryoshka Networks: Predicting 3D Geometry via Nested Shape Layers , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[6] Noah Snavely,et al. Layer-structured 3D Scene Inference via View Synthesis , 2018, ECCV.

[7] Meng Wang,et al. Graphonomy: Universal Human Parsing via Graph Transfer Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8] Leonidas J. Guibas,et al. PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space , 2017, NIPS.

[9] Wei Liu,et al. Pixel2Mesh: Generating 3D Mesh Models from Single RGB Images , 2018, ECCV.

[10] Cordelia Schmid,et al. Moulding Humans: Non-Parametric 3D Human Shape Estimation From Single Images , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[11] Marcus A. Magnor,et al. Tex2Shape: Detailed Full Human Body Geometry From a Single Image , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[12] Cristian Sminchisescu,et al. Combining Implicit Function Learning and Parametric Models for 3D Human Reconstruction , 2020, ECCV.

[13] Christian Theobalt,et al. MonoPerfCap , 2017, ACM Trans. Graph..

[14] Georgios Tzimiropoulos,et al. 3D Human Body Reconstruction from a Single Image via Volumetric Regression , 2018, ECCV Workshops.

[15] Siyu Zhu,et al. Self-Supervised Human Depth Estimation From Monocular Videos , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16] Iasonas Kokkinos,et al. DensePose: Dense Human Pose Estimation in the Wild , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[17] Andreas Geiger,et al. Differentiable Volumetric Rendering: Learning Implicit 3D Representations Without 3D Supervision , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18] Tao Yu,et al. DeepHuman: 3D Human Reconstruction From a Single Image , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[19] Ruigang Yang,et al. Detailed Human Shape Estimation From a Single Image by Hierarchical Mesh Deformation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20] Cordelia Schmid,et al. Learning from Synthetic Humans , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21] Hao Li,et al. PIFu: Pixel-Aligned Implicit Function for High-Resolution Clothed Human Digitization , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[22] Jitendra Malik,et al. End-to-End Recovery of Human Shape and Pose , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[23] Marcus A. Magnor,et al. Learning to Reconstruct People in Clothing From a Single RGB Camera , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24] Avinash Sharma,et al. Deep Textured 3D Reconstruction of Human Bodies , 2018, BMVC.

[25] Charless C. Fowlkes,et al. Multi-layer Depth and Epipolar Feature Transformers for 3D Scene Reconstruction , 2019, CVPR Workshops.

[26] Hao Li,et al. SiCloPe: Silhouette-Based Clothed People , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27] Pushmeet Kohli,et al. Fusion4D , 2016, ACM Trans. Graph..

[28] Christian Theobalt,et al. Multi-Garment Net: Learning to Dress 3D People From Images , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[29] Stefano Soatto,et al. Geo-PIFu: Geometry and Pixel Aligned Implicit Functions for Single-view Human Reconstruction , 2020, NeurIPS.

[30] Alexei A. Efros,et al. Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31] Hao Li,et al. Learning to Infer Implicit Surfaces without 3D Supervision , 2019, NeurIPS.

[32] Dimitrios Tzionas,et al. Expressive Body Capture: 3D Hands, Face, and Body From a Single Image , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[33] Chaitanya Patel,et al. TailorNet: Predicting Clothing in 3D as a Function of Human Pose, Shape and Garment Style , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[34] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35] Sebastian Nowozin,et al. Occupancy Networks: Learning 3D Reconstruction in Function Space , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[36] Gerard Pons-Moll,et al. Implicit Functions in Feature Space for 3D Shape Reconstruction and Completion , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[37] Richard Szeliski,et al. Layered depth images , 1998, SIGGRAPH.

[38] Cordelia Schmid,et al. BodyNet: Volumetric Inference of 3D Human Body Shapes , 2018, ECCV.

[39] Michael J. Black,et al. Dynamic FAUST: Registering Human Bodies in Motion , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40] Michael J. Black,et al. SMPL: A Skinned Multi-Person Linear Model , 2023 .

[41] Wojciech Matusik,et al. Articulated mesh animation from multi-view silhouettes , 2008, ACM Trans. Graph..

[42] Chongyang Ma,et al. Deep Volumetric Video From Very Sparse Multi-view Performance Capture , 2018, ECCV.