SCALE: Modeling Clothed Humans with a Surface Codec of Articulated Local Elements

Learning to model and reconstruct humans in clothing is challenging due to articulation, non-rigid deformation, and varying clothing types and topologies. To enable learning, the choice of representation is the key. Recent work uses neural networks to parameterize local surface elements. This approach captures locally coherent geometry and non-planar details, can deal with varying topology, and does not require registered training data. However, naively using such methods to model 3D clothed humans fails to capture fine-grained local deformations and generalizes poorly. To address this, we present three key innovations: First, we deform surface elements based on a human body model such that large-scale deformations caused by articulation are explicitly separated from topological changes and local clothing deformations. Second, we address the limitations of existing neural surface elements by regressing local geometry from local features, significantly improving the expressiveness. Third, we learn a pose embedding on a 2D parameterization space that encodes posed body geometry, improving generalization to unseen poses by reducing non-local spurious correlations. We demonstrate the efficacy of our surface representation by learning models of complex clothing from point clouds. The clothing can change topology and deviate from the topology of the body. Once learned, we can animate previously unseen motions, producing high-quality point clouds, from which we generate realistic images with neural rendering. We assess the importance of each technical contribution and show that our approach outperforms the state-of-the-art methods in terms of reconstruction accuracy and inference time. The code is available for research purposes at https://qianlim.github.io/SCALE.

[1]  Andreas Geiger,et al.  Locally Aware Piecewise Transformation Fields for 3D Human Mesh Registration , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Michael J. Black,et al.  LEAP: Learning Articulated Occupancy of People , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Michael J. Black,et al.  SCANimate: Weakly Supervised Learning of Skinned Clothed Avatar Networks , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  S. Fidler,et al.  Neural Parts: Learning Expressive 3D Shape Abstractions with Invertible Neural Networks , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Francesc Moreno-Noguer,et al.  SMPLicit: Topology-aware Generative Model for Clothed People , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Gerard Pons-Moll,et al.  Neural Unsigned Distance Fields for Implicit Function Learning , 2020, NeurIPS.

[7]  Bharat Lal Bhatnagar,et al.  LoopReg: Self-supervised Learning of Implicit Surface Correspondences, Pose and Shape for 3D Human Mesh Registration , 2020, NeurIPS.

[8]  Pascal Fua,et al.  Better Patch Stitching for Parametric Surface Reconstruction , 2020, 2020 International Conference on 3D Vision (3DV).

[9]  J. Hodgins,et al.  MonoClothCap: Towards Temporally Coherent Clothing Capture from Monocular RGB Video , 2020, 2020 International Conference on 3D Vision (3DV).

[10]  Dimitrios Tzionas,et al.  Monocular Expressive Body Regression through Body-Driven Attention , 2020, ECCV.

[11]  Michael J. Black,et al.  STAR: Sparse Trained Articulated Human Body Regressor , 2020, ECCV.

[12]  Michael J. Black,et al.  SMPLpix: Neural Avatars from 3D Human Models , 2020, 2021 IEEE Winter Conference on Applications of Computer Vision (WACV).

[13]  Noah Snavely,et al.  Learning Gradient Fields for Shape Generation , 2020, ECCV.

[14]  Yan Zhang,et al.  PLACE: Proximity Learning of Articulation and Contact in 3D Environments , 2020, 2020 International Conference on 3D Vision (3DV).

[15]  M. Zollhöfer,et al.  PatchNets: Patch-Based Generalizable Deep Implicit 3D Shape Representations , 2020, ECCV.

[16]  Bharat Lal Bhatnagar,et al.  Combining Implicit Function Learning and Parametric Models for 3D Human Reconstruction , 2020, ECCV.

[17]  Tony Tung,et al.  SIZER: A Dataset and Model for Parsing 3D Clothing and Learning Size Sensitive 3D Clothing , 2020, ECCV.

[18]  Minghua Liu,et al.  Meshing Point Clouds with Predicted Intrinsic-Extrinsic Ratio Guidance , 2020, ECCV.

[19]  Cristian Sminchisescu,et al.  GHUM & GHUML: Generative 3D Human Shape and Articulated Pose Models , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  M. Ovsjanikov,et al.  PointTriNet: Learned Triangulation of 3D Point Sets , 2020, ECCV.

[21]  Hanbyul Joo,et al.  PIFuHD: Multi-Level Pixel-Aligned Implicit Function for High-Resolution 3D Human Digitization , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Shuguang Cui,et al.  Deep Fashion3D: A Dataset and Benchmark for 3D Garment Reconstruction from Single Images , 2020, ECCV.

[23]  Richard A. Newcombe,et al.  Deep Local Shapes: Learning Local SDF Priors for Detailed 3D Reconstruction , 2020, ECCV.

[24]  Thomas Funkhouser,et al.  Local Implicit Grid Representations for 3D Scenes , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Chaitanya Patel,et al.  TailorNet: Predicting Clothing in 3D as a Function of Human Pose, Shape and Garment Style , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Marc Pollefeys,et al.  Convolutional Occupancy Networks , 2020, ECCV.

[27]  Gerard Pons-Moll,et al.  Implicit Functions in Feature Space for 3D Shape Reconstruction and Completion , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Y. Lipman,et al.  Implicit Geometric Regularization for Learning Shapes , 2020, ICML.

[29]  Thomas Funkhouser,et al.  Local Deep Implicit Functions for 3D Shape , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Michael J. Black,et al.  VIBE: Video Inference for Human Body Pose and Shape Estimation , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Pascal Fua,et al.  Shape Reconstruction by Learning Differentiable Surface Representations , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Michael J. Black,et al.  Learning to Reconstruct 3D Human Pose and Shape via Model-Fitting in the Loop , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[33]  Xiaoguang Han,et al.  Deep Mesh Reconstruction From Single RGB Images via Topology Modification Networks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[34]  Bharat Lal Bhatnagar,et al.  Multi-Garment Net: Learning to Dress 3D People From Images , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[35]  Mathieu Aubry,et al.  Learning elementary structures for 3D shape generation and matching , 2019, NeurIPS.

[36]  Michael J. Black,et al.  Learning to Dress 3D People in Generative Clothing , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Ming-Yu Liu,et al.  PointFlow: 3D Point Cloud Generation With Continuous Normalizing Flows , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[38]  Victor Lempitsky,et al.  Neural Point-Based Graphics , 2019, ECCV.

[39]  Olga Sorkine-Hornung,et al.  Differentiable surface splatting for point-based geometry processing , 2019, ACM Trans. Graph..

[40]  Duygu Ceylan,et al.  DISN: Deep Implicit Surface Network for High-quality Single-view 3D Reconstruction , 2019, NeurIPS.

[41]  Hao Li,et al.  PIFu: Pixel-Aligned Implicit Function for High-Resolution Clothed Human Digitization , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[42]  Dimitrios Tzionas,et al.  Expressive Body Capture: 3D Hands, Face, and Body From a Single Image , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  M. Otaduy,et al.  Learning‐Based Animation of Clothing for Virtual Try‐On , 2019, Comput. Graph. Forum.

[44]  Richard A. Newcombe,et al.  DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Federico Tombari,et al.  3D Point Capsule Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Sebastian Nowozin,et al.  Occupancy Networks: Learning 3D Reconstruction in Function Space , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Hao Zhang,et al.  Learning Implicit Fields for Generative Shape Modeling , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Jitendra Malik,et al.  Learning 3D Human Dynamics From Video , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  Ronald Fedkiw,et al.  A Pixel‐Based Framework for Data‐Driven Clothing , 2018, Comput. Graph. Forum.

[50]  Pascal Fua,et al.  GarNet: A Two-Stream Network for Fast and Accurate 3D Cloth Draping , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[51]  Joan Bruna,et al.  Deep Geometric Prior for Surface Reconstruction , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  Ke Wang,et al.  Physics-Inspired Garment Recovery from a Single-View Image , 2018, ACM Trans. Graph..

[53]  Jinlong Yang,et al.  Analyzing Clothing Layer Deformation Statistics of 3D Human Motions , 2018, ECCV.

[54]  Daniel Cremers,et al.  DeepWrinkles: Accurate and Realistic Clothing Modeling , 2018, ECCV.

[55]  Martial Hebert,et al.  PCN: Point Completion Network , 2018, 2018 International Conference on 3D Vision (3DV).

[56]  Michael J. Black,et al.  Generating 3D faces using Convolutional Mesh Autoencoders , 2018, ECCV.

[57]  Michael J. Black,et al.  Learning Human Optical Flow , 2018, BMVC.

[58]  Mathieu Aubry,et al.  3D-CODED: 3D Correspondences by Deep Deformation , 2018, ECCV.

[59]  Mathieu Aubry,et al.  A Papier-Mache Approach to Learning 3D Surface Generation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[60]  Yaser Sheikh,et al.  Total Capture: A 3D Deformation Model for Tracking Faces, Hands, and Bodies , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[61]  Dong Tian,et al.  FoldingNet: Point Cloud Auto-Encoder via Deep Grid Deformation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[62]  Jitendra Malik,et al.  End-to-End Recovery of Human Shape and Pose , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[63]  Michael J. Black,et al.  ClothCap , 2017, ACM Trans. Graph..

[64]  S. Lucey,et al.  Learning Efficient Point Cloud Generation for Dense 3D Object Reconstruction , 2017, AAAI.

[65]  Markus H. Gross,et al.  DeepGarment : 3D Garment Shape Estimation from a Single Image , 2017, Comput. Graph. Forum.

[66]  Cordelia Schmid,et al.  Learning from Synthetic Humans , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[67]  Raquel Urtasun,et al.  Understanding the Effective Receptive Field in Deep Convolutional Neural Networks , 2016, NIPS.

[68]  Hao Su,et al.  A Point Set Generation Network for 3D Object Reconstruction from a Single Image , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[69]  Leonidas J. Guibas,et al.  PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[70]  Peter V. Gehler,et al.  Keep It SMPL: Automatic Estimation of 3D Human Pose and Shape from a Single Image , 2016, ECCV.

[71]  Li Fei-Fei,et al.  Perceptual Losses for Real-Time Style Transfer and Super-Resolution , 2016, ECCV.

[72]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[73]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[74]  Adrian Hilton,et al.  A Layered Model of Human Body and Garment Deformation , 2014, 2014 2nd International Conference on 3D Vision.

[75]  James F. O'Brien,et al.  Fast simulation of mass-spring systems , 2013, ACM Trans. Graph..

[76]  Michael M. Kazhdan,et al.  Screened poisson surface reconstruction , 2013, TOGS.

[77]  Michael J. Black,et al.  DRAPE , 2012, ACM Trans. Graph..

[78]  Michael M. Kazhdan,et al.  Poisson surface reconstruction , 2006, SGP '06.

[79]  Dragomir Anguelov,et al.  SCAPE: shape completion and animation of people , 2005, ACM Trans. Graph..

[80]  Gabriel Taubin,et al.  The ball-pivoting algorithm for surface reconstruction , 1999, IEEE Transactions on Visualization and Computer Graphics.

[81]  Andrew P. Witkin,et al.  Large steps in cloth simulation , 1998, SIGGRAPH.

[82]  William E. Lorensen,et al.  Marching cubes: A high resolution 3D surface construction algorithm , 1987, SIGGRAPH.

[83]  Dimitrios Tzionas,et al.  Embodied hands , 2017, ACM Trans. Graph..

[84]  Ming C. Lin,et al.  Differentiable Cloth Simulation for Inverse Problems , 2019, NeurIPS.