LoopReg: Self-supervised Learning of Implicit Surface Correspondences, Pose and Shape for 3D Human Mesh Registration

We address the problem of fitting 3D human models to 3D scans of dressed humans. Classical methods optimize both the data-to-model correspondences and the human model parameters (pose and shape), but are reliable only when initialized close to the solution. Some methods initialize the optimization based on fully supervised correspondence predictors, which is not differentiable end-to-end, and can only process a single scan at a time. Our main contribution is LoopReg, an end-to-end learning framework to register a corpus of scans to a common 3D human model. The key idea is to create a self-supervised loop. A backward map, parameterized by a Neural Network, predicts the correspondence from every scan point to the surface of the human model. A forward map, parameterized by a human model, transforms the corresponding points back to the scan based on the model parameters (pose and shape), thus closing the loop. Formulating this closed loop is not straightforward because it is not trivial to force the output of the NN to be on the surface of the human model - outside this surface the human model is not even defined. To this end, we propose two key innovations. First, we define the canonical surface implicitly as the zero level set of a distance field in R3, which in contrast to morecommon UV parameterizations, does not require cutting the surface, does not have discontinuities, and does not induce distortion. Second, we diffuse the human model to the 3D domain R3. This allows to map the NN predictions forward,even when they slightly deviate from the zero level set. Results demonstrate that we can train LoopRegmainly self-supervised - following a supervised warm-start, the model becomes increasingly more accurate as additional unlabelled raw scans are processed. Our code and pre-trained models can be downloaded for research.

[1]  Michael J. Black,et al.  Learning a model of facial shape and expression from 4D scans , 2017, ACM Trans. Graph..

[2]  Daniel Cremers,et al.  Product Manifold Filter: Non-rigid Shape Correspondence via Kernel Density Estimation in the Product Space , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Andrew W. Fitzgibbon,et al.  Real-time non-rigid reconstruction using an RGB-D camera , 2014, ACM Trans. Graph..

[4]  Maks Ovsjanikov,et al.  Functional maps , 2012, ACM Trans. Graph..

[5]  Andrew W. Fitzgibbon,et al.  The Vitruvian manifold: Inferring dense correspondences for one-shot human pose estimation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  F. Mémoli,et al.  Geometric Surface and Brain Warping via Geodesic Minimizing Lipschitz Extensions ? , 2006 .

[7]  Marcus A. Magnor,et al.  Detailed Human Avatars from Monocular Video , 2018, 2018 International Conference on 3D Vision (3DV).

[8]  Leonidas J. Guibas,et al.  PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space , 2017, NIPS.

[9]  Alexander M. Bronstein,et al.  Three-Dimensional Face Recognition , 2005, International Journal of Computer Vision.

[10]  Michael J. Black,et al.  SMPL: A Skinned Multi-Person Linear Model , 2023 .

[11]  Sebastian Thrun,et al.  SCAPE: shape completion and animation of people , 2005, SIGGRAPH '05.

[12]  Xiao-Ming Fu,et al.  Atlas refinement with bounded packing efficiency , 2019, ACM Trans. Graph..

[13]  Paul M. Thompson,et al.  Mutual information-based 3D surface matching with applications to face recognition and brain mapping , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[14]  Leonidas J. Guibas,et al.  One Point Isometric Matching with the Heat Kernel , 2010, Comput. Graph. Forum.

[15]  Gerard Pons-Moll,et al.  360-Degree Textures of People in Clothing from a Single Image , 2019, 2019 International Conference on 3D Vision (3DV).

[16]  Michael J. Black,et al.  FAUST: Dataset and Evaluation for 3D Mesh Registration , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Guillermo Sapiro,et al.  A Gromov-Hausdorff Framework with Diffusion Geometry for Topologically-Robust Non-rigid Shape Matching , 2010, International Journal of Computer Vision.

[18]  Richard A. Newcombe,et al.  DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Bernt Schiele,et al.  Building statistical shape spaces for 3D human modeling , 2015, Pattern Recognit..

[20]  Andrew W. Fitzgibbon,et al.  Efficient and precise interactive hand tracking through joint, continuous optimization of pose and correspondences , 2016, ACM Trans. Graph..

[21]  Hao Li,et al.  ARCH: Animatable Reconstruction of Clothed Humans , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Michael J. Black,et al.  Coregistration: Simultaneous Alignment and Modeling of Articulated 3D Shape , 2012, ECCV.

[23]  Michael J. Black,et al.  Detailed, Accurate, Human Shape Estimation from Clothed 3D Scan Sequences , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Sebastian Nowozin,et al.  Occupancy Networks: Learning 3D Reconstruction in Function Space , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Ron Kimmel,et al.  Unsupervised Learning of Dense Shape Correspondence , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Yaser Sheikh,et al.  LBS Autoencoder: Self-Supervised Fitting of Articulated Meshes to Point Clouds , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Nassir Navab,et al.  SDF-2-SDF Registration for Real-Time 3D Reconstruction from RGB-D Data , 2017, International Journal of Computer Vision.

[28]  Maks Ovsjanikov,et al.  Unsupervised Deep Learning for Structured Shape Matching , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[29]  Slobodan Ilic,et al.  Towards Implicit Correspondence in Signed Distance Field Evolution , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[30]  Gerard Pons-Moll,et al.  Implicit Functions in Feature Space for 3D Shape Reconstruction and Completion , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Shubham Tulsiani,et al.  Canonical Surface Mapping via Geometric Cycle Consistency , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[32]  Umberto Castellani,et al.  FARM: Functional Automatic Registration Method for 3D Human Bodies , 2018, Comput. Graph. Forum.

[33]  Didier Stricker,et al.  DispVoxNets: Non-Rigid Point Set Alignment with Supervised Learning Proxies , 2019, 2019 International Conference on 3D Vision (3DV).

[34]  Gary K. L. Tam,et al.  Non-rigid registration under anisotropic deformations , 2019, Comput. Aided Geom. Des..

[35]  Michael J. Black,et al.  Dyna: a model of dynamic human shape in motion , 2015, ACM Trans. Graph..

[36]  Andrew W. Fitzgibbon,et al.  Robust Registration of 2D and 3D Point Sets , 2003, BMVC.

[37]  Sebastian Thrun,et al.  The Correlated Correspondence Algorithm for Unsupervised Registration of Nonrigid Surfaces , 2004, NIPS.

[38]  Iasonas Kokkinos,et al.  DensePose: Dense Human Pose Estimation in the Wild , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[39]  Abhishek Sharma,et al.  Deep Geometric Functional Maps: Robust Feature Learning for Shape Correspondence , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Andrew W. Fitzgibbon,et al.  Metric Regression Forests for Human Pose Estimation , 2013, BMVC.

[41]  Mathieu Aubry,et al.  3D-CODED: 3D Correspondences by Deep Deformation , 2018, ECCV.

[42]  David Kim,et al.  Articulated distance fields for ultra-fast tracking of hands interacting , 2017, ACM Trans. Graph..

[43]  Matthew Turk,et al.  A Morphable Model For The Synthesis Of 3D Faces , 1999, SIGGRAPH.

[44]  Michael J. Black,et al.  Dynamic FAUST: Registering Human Bodies in Motion , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Bruno Lévy,et al.  Least squares conformal maps for automatic texture atlas generation , 2002, ACM Trans. Graph..

[46]  Dan Raviv,et al.  Cyclic Functional Mapping: Self-supervised correspondence between non-isometric deformable shapes , 2019, ECCV.

[47]  Cristian Sminchisescu,et al.  Combining Implicit Function Learning and Parametric Models for 3D Human Reconstruction , 2020, ECCV.

[48]  Cristian Sminchisescu Consistency and coupling in human model likelihoods , 2002, Proceedings of Fifth IEEE International Conference on Automatic Face Gesture Recognition.

[49]  Christian Theobalt,et al.  Multi-Garment Net: Learning to Dress 3D People From Images , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[50]  Qi-Xing Huang,et al.  Dense Human Body Correspondences Using Convolutional Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  Konstantin Mischaikow,et al.  Feature-based surface parameterization and texture mapping , 2005, TOGS.

[52]  Leonidas J. Guibas,et al.  Persistence-based segmentation of deformable shapes , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[53]  Christian Theobalt,et al.  MINA: Convex Mixed-Integer Programming for Non-Rigid Shape Alignment , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[54]  Cristian Sminchisescu,et al.  GHUM & GHUML: Generative 3D Human Shape and Articulated Pose Models , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[55]  Jonathan Masci,et al.  Learning shape correspondence with anisotropic convolutional neural networks , 2016, NIPS.

[56]  Alexander M. Bronstein,et al.  Deep Functional Maps: Structured Prediction for Dense Shape Correspondence , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[57]  Christian Rössl,et al.  Dense correspondence finding for parametrization-free animation reconstruction from video , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[58]  Ron Kimmel,et al.  Generalized multidimensional scaling: A framework for isometry-invariant partial surface matching , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[59]  A. She SURFACE PARAMETERIZATION FOR MESHING BY TRIANGULATION FLATTENING , 2000 .

[60]  Mathieu Aubry,et al.  Learning elementary structures for 3D shape generation and matching , 2019, NeurIPS.

[61]  Andrew W. Fitzgibbon,et al.  Metric Regression Forests for Correspondence Estimation , 2015, International Journal of Computer Vision.

[62]  Bernhard Schölkopf,et al.  Support Vector Machines for 3D Shape Processing , 2005, Comput. Graph. Forum.

[63]  Vladimir G. Kim,et al.  Blended intrinsic maps , 2011, ACM Trans. Graph..

[64]  Daniel Cremers,et al.  KillingFusion: Non-rigid 3D Reconstruction without Correspondences , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[65]  Edmond Boyer,et al.  FeaStNet: Feature-Steered Graph Convolutions for 3D Shape Analysis , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[66]  Gerard Pons-Moll,et al.  Neural Unsigned Distance Fields for Implicit Function Learning , 2020, NeurIPS.

[67]  Yalin Wang,et al.  Optimal global conformal surface parameterization , 2004, IEEE Visualization 2004.

[68]  Vladimir G. Kim,et al.  OptCuts: joint optimization of surface cuts and parameterization , 2019, ACM Trans. Graph..

[69]  Cordelia Schmid,et al.  Learning from Synthetic Humans , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[70]  Hao Li,et al.  PIFu: Pixel-Aligned Implicit Function for High-Resolution Clothed Human Digitization , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[71]  Marcus A. Magnor,et al.  Learning to Reconstruct People in Clothing From a Single RGB Camera , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[72]  Michael J. Black,et al.  ClothCap , 2017, ACM Trans. Graph..

[73]  Meekyoung Kim,et al.  Data-driven physics for human soft tissue animation , 2017, ACM Trans. Graph..

[74]  Bruno Lévy,et al.  Mesh parameterization: theory and practice , 2007, SIGGRAPH Courses.