论文信息 - Dynamic Surface Function Networks for Clothed Human Bodies

Dynamic Surface Function Networks for Clothed Human Bodies

We present a novel method for temporal coherent reconstruction and tracking of clothed humans. Given a monocular RGB-D sequence, we learn a person-specific body model which is based on a dynamic surface function network. To this end, we explicitly model the surface of the person using a multi-layer perceptron (MLP) which is embedded into the canonical space of the SMPL body model. With classical forward rendering, the represented surface can be rasterized using the topology of a template mesh. For each surface point of the template mesh, the MLP is evaluated to predict the actual surface location. To handle pose-dependent deformations, the MLP is conditioned on the SMPL pose parameters. We show that this surface representation as well as the pose parameters can be learned in a self-supervised fashion using the principle of analysisby-synthesis and differentiable rasterization. As a result, we are able to reconstruct a temporally coherent mesh sequence from the input data. The underlying surface representation can be used to synthesize new animations of the reconstructed person including pose-dependent deformations.

[1] Dimitrios Tzionas,et al. Expressive Body Capture: 3D Hands, Face, and Body From a Single Image , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2] Marcus A. Magnor,et al. Tex2Shape: Detailed Full Human Body Geometry From a Single Image , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[3] Michael J. Black,et al. Home 3D body scans from noisy image and range data , 2011, 2011 International Conference on Computer Vision.

[4] Dieter Fox,et al. DynamicFusion: Reconstruction and tracking of non-rigid scenes in real-time , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5] Michael J. Black,et al. Detailed Full-Body Reconstructions of Moving People from Monocular RGB-D Sequences , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[6] DaiQionghai,et al. Real-Time Geometry, Albedo, and Motion Reconstruction Using a Single RGB-D Camera , 2017 .

[7] M. Zollhöfer,et al. PatchNets: Patch-Based Generalizable Deep Implicit 3D Shape Representations , 2020, ECCV.

[8] Michael J. Black,et al. The Power of Points for Modeling Humans in Clothing , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[9] Michael J. Black,et al. SMPL: A Skinned Multi-Person Linear Model , 2023 .

[10] Justus Thies,et al. Deferred Neural Rendering: Image Synthesis using Neural Textures , 2019 .

[11] Jochen Lang,et al. Estimation of human body shape and posture under clothing , 2013, Comput. Vis. Image Underst..

[12] Bharat Lal Bhatnagar,et al. Combining Implicit Function Learning and Parametric Models for 3D Human Reconstruction , 2020, ECCV.

[13] Andreas Geiger,et al. MetaAvatar: Learning Animatable Clothed Human Models from Few Depth Images , 2021, NeurIPS.

[14] Thomas A. Funkhouser,et al. Learning Shape Templates With Structured Implicit Functions , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[15] Michael J. Black,et al. SCALE: Modeling Clothed Humans with a Surface Codec of Articulated Local Elements , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16] Patrick Pérez,et al. State of the Art on Monocular 3D Face Reconstruction, Tracking, and Applications , 2018, Comput. Graph. Forum.

[17] Michael J. Black,et al. Detailed, Accurate, Human Shape Estimation from Clothed 3D Scan Sequences , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18] Hanbyul Joo,et al. PIFuHD: Multi-Level Pixel-Aligned Implicit Function for High-Resolution 3D Human Digitization , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19] Francesc Moreno-Noguer,et al. SMPLicit: Topology-aware Generative Model for Clothed People , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20] Michael J. Black,et al. MoSh: motion and shape capture from sparse markers , 2014, ACM Trans. Graph..

[21] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[22] Michael J. Black,et al. Pose-conditioned joint angle limits for 3D human pose reconstruction , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23] Marcus A. Magnor,et al. Learning to Reconstruct People in Clothing From a Single RGB Camera , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24] Jingyi Yu,et al. Few-shot Neural Human Performance Rendering from Sparse RGBD Videos , 2021, IJCAI.

[25] Chenglei Wu,et al. MonoClothCap: Towards Temporally Coherent Clothing Capture from Monocular RGB Video , 2020, 2020 International Conference on 3D Vision (3DV).

[26] Matthias Nießner,et al. VolumeDeform: Real-Time Volumetric Non-rigid Reconstruction , 2016, ECCV.

[27] Jorge Nocedal,et al. On the limited memory BFGS method for large scale optimization , 1989, Math. Program..

[28] Gerard Pons-Moll,et al. Implicit Functions in Feature Space for 3D Shape Reconstruction and Completion , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29] Andrew W. Fitzgibbon,et al. 3D scanning deformable objects with a single RGBD sensor , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30] Christian Theobalt,et al. Neural actor , 2021, ACM Trans. Graph..

[31] Andrew W. Fitzgibbon,et al. Metric Regression Forests for Correspondence Estimation , 2015, International Journal of Computer Vision.

[32] Iasonas Kokkinos,et al. DensePose: Dense Human Pose Estimation in the Wild , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[33] Sebastian Thrun,et al. Video-based reconstruction of animatable human characters , 2010, ACM Trans. Graph..

[34] Christian Theobalt,et al. LiveCap , 2018, ACM Trans. Graph..

[35] Xiaoyang Liu,et al. Real-Time Geometry, Albedo, and Motion Reconstruction Using a Single RGB-D Camera , 2017, ACM Trans. Graph..

[36] Pratul P. Srinivasan,et al. NeRF , 2020, ECCV.

[37] Angela Dai,et al. NPMs: Neural Parametric Models for 3D Deformable Shapes , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[38] Hao Li,et al. ARCH: Animatable Reconstruction of Clothed Humans , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[39] Hans-Peter Seidel,et al. Personalization and Evaluation of a Real-Time Depth-Based Full Body Tracker , 2013, 2013 International Conference on 3D Vision.

[40] Andrea Tagliasacchi,et al. NASA: Neural Articulated Shape Approximation , 2020, ECCV.

[41] Yuan Yu,et al. TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[42] Marcus A. Magnor,et al. Detailed Human Avatars from Monocular Video , 2018, 2018 International Conference on 3D Vision (3DV).

[43] Hao Li,et al. PIFu: Pixel-Aligned Implicit Function for High-Resolution Clothed Human Digitization , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[44] Nikolaus F. Troje,et al. AMASS: Archive of Motion Capture As Surface Shapes , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[45] Andreas Geiger,et al. Occupancy Flow: 4D Reconstruction by Learning Particle Dynamics , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[46] William E. Lorensen,et al. Marching cubes: A high resolution 3D surface construction algorithm , 1987, SIGGRAPH.

[47] Kate Saenko,et al. Active Domain Adaptation via Clustering Uncertainty-weighted Embeddings , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[48] Adrian Hilton,et al. A Layered Model of Human Body and Garment Deformation , 2014, 2014 2nd International Conference on 3D Vision.

[49] Andreas Geiger,et al. Locally Aware Piecewise Transformation Fields for 3D Human Mesh Registration , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[50] Gordon Wetzstein,et al. Implicit Neural Representations with Periodic Activation Functions , 2020, NeurIPS.

[51] Li Fei-Fei,et al. Perceptual Losses for Real-Time Style Transfer and Super-Resolution , 2016, ECCV.

[52] Tao Yu,et al. BodyFusion: Real-Time Capture of Human Motion and Surface Geometry Using a Single Depth Camera , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[53] Yaser Sheikh,et al. OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[54] Helge Rhodin,et al. A-NeRF: Surface-free Human 3D Pose Refinement via Neural Rendering , 2021, ArXiv.

[55] Michael J. Black,et al. Learning to Dress 3D People in Generative Clothing , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[56] Hujun Bao,et al. Animatable Neural Radiance Fields for Human Body Modeling , 2021, ArXiv.

[57] Michael J. Black,et al. SCANimate: Weakly Supervised Learning of Skinned Clothed Avatar Networks , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[58] Michael J. Black,et al. LEAP: Learning Articulated Occupancy of People , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[59] Bo Fu,et al. Quality Dynamic Human Body Modeling Using a Single Low-Cost Depth Camera , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[60] Daniel Cremers,et al. KillingFusion: Non-rigid 3D Reconstruction without Correspondences , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[61] Michael J. Black,et al. SNARF: Differentiable Forward Skinning for Animating Non-Rigid Neural Implicit Shapes , 2021, IEEE International Conference on Computer Vision.

[62] Justus Thies,et al. Neural Deformation Graphs for Globally-consistent Non-rigid Reconstruction , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[63] Meng Wang,et al. Graphonomy: Universal Human Parsing via Graph Transfer Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[64] Christian Theobalt,et al. DeepCap: Monocular Human Performance Capture Using Weak Supervision , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[65] Thomas Funkhouser,et al. Local Deep Implicit Functions for 3D Shape , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[66] Stephen Lin,et al. Neural Articulated Radiance Field , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[67] Marcus A. Magnor,et al. Video Based Reconstruction of 3D People Models , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[68] Marc Levoy,et al. A volumetric method for building complex models from range images , 1996, SIGGRAPH.

[69] Slobodan Ilic,et al. SobolevFusion: 3D Reconstruction of Scenes Undergoing Free Non-rigid Motion , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.