论文信息 - SCANimate: Weakly Supervised Learning of Skinned Clothed Avatar Networks

SCANimate: Weakly Supervised Learning of Skinned Clothed Avatar Networks

We present SCANimate, an end-to-end trainable framework that takes raw 3D scans of a clothed human and turns them into an animatable avatar. These avatars are driven by pose parameters and have realistic clothing that moves and deforms naturally. SCANimate does not rely on a customized mesh template or surface mesh registration. We observe that fitting a parametric 3D body model, like SMPL, to a clothed human scan is tractable while surface registration of the body topology to the scan is often not, because clothing can deviate significantly from the body shape. We also observe that articulated transformations are invertible, resulting in geometric cycle-consistency in the posed and unposed shapes. These observations lead us to a weakly supervised learning method that aligns scans into a canonical pose by disentangling articulated deformations without template-based surface registration. Furthermore, to complete missing regions in the aligned scans while modeling pose-dependent deformations, we introduce a locally pose-aware implicit function that learns to complete and model geometry with learned pose correctives. In contrast to commonly used global pose embeddings, our local pose conditioning significantly reduces long-range spurious correlations and improves generalization to unseen poses, especially when training data is limited. Our method can be applied to pose-aware appearance modeling to generate a fully textured avatar. We demonstrate our approach on various clothing types with different amounts of training data, outperforming existing solutions and other variants in terms of fidelity and generality in every setting. The code is available at https://scanimate.is.tue.mpg.de.

[1] Nobuyuki Umetani,et al. Sensitivity-optimized rigging for example-based real-time clothing synthesis , 2014, ACM Trans. Graph..

[2] Michael J. Black,et al. VIBE: Video Inference for Human Body Pose and Shape Estimation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3] Marcus A. Magnor,et al. Video Based Reconstruction of 3D People Models , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[4] Michael J. Black,et al. Detailed, Accurate, Human Shape Estimation from Clothed 3D Scan Sequences , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5] Dimitrios Tzionas,et al. Embodied Hands: Modeling and Capturing Hands and Bodies Together , 2022, ArXiv.

[6] Justus Thies,et al. Adversarial Texture Optimization From RGB-D Scans , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7] Markus H. Gross,et al. DeepGarment : 3D Garment Shape Estimation from a Single Image , 2017, Comput. Graph. Forum.

[8] Jinlong Yang,et al. Estimation of Human Body Shape in Motion with Wide Clothing , 2016, ECCV.

[9] Michael J. Black,et al. Estimating human shape and pose from a single image , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[10] Zoran Popovic,et al. Articulated body deformation from range scan data , 2002, SIGGRAPH.

[11] Peter V. Gehler,et al. Keep It SMPL: Automatic Estimation of 3D Human Pose and Shape from a Single Image , 2016, ECCV.

[12] Yaron Lipman,et al. Implicit Geometric Regularization for Learning Shapes , 2020, ICML.

[13] William E. Lorensen,et al. Marching cubes: A high resolution 3D surface construction algorithm , 1987, SIGGRAPH.

[14] Adrian Munteanu,et al. Learning to Estimate the Body Shape Under Clothing From a Single 3-D Scan , 2021, IEEE Transactions on Industrial Informatics.

[15] Dimitrios Tzionas,et al. Expressive Body Capture: 3D Hands, Face, and Body From a Single Image , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16] Daniel Cremers,et al. DeepWrinkles: Accurate and Realistic Clothing Modeling , 2018, ECCV.

[17] John P. Lewis,et al. Pose Space Deformation: A Unified Approach to Shape Interpolation and Skeleton-Driven Deformation , 2000, SIGGRAPH.

[18] Jitendra Malik,et al. End-to-End Recovery of Human Shape and Pose , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[19] Michael J. Black,et al. The Naked Truth: Estimating Body Shape Under Clothing , 2008, ECCV.

[20] Pascal Fua,et al. GarNet: A Two-Stream Network for Fast and Accurate 3D Cloth Draping , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[21] Andreas Geiger,et al. Locally Aware Piecewise Transformation Fields for 3D Human Mesh Registration , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22] Jovan Popovic,et al. Automatic rigging and animation of 3D characters , 2007, ACM Trans. Graph..

[23] Hao Li,et al. ARCH: Animatable Reconstruction of Clothed Humans , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24] Gordon Wetzstein,et al. Implicit Neural Representations with Periodic Activation Functions , 2020, NeurIPS.

[25] Hans-Peter Seidel,et al. A Statistical Model of Human Pose and Body Shape , 2009, Comput. Graph. Forum.

[26] Sergio Escalera,et al. DeePSD: Automatic Deep Skinning And Pose Space Deformation For 3D Garment Animation , 2020, ArXiv.

[27] Michael J. Black,et al. DRAPE , 2012, ACM Trans. Graph..

[28] Bharat Lal Bhatnagar,et al. LoopReg: Self-supervised Learning of Implicit Surface Correspondences, Pose and Shape for 3D Human Mesh Registration , 2020, NeurIPS.

[29] Chaitanya Patel,et al. TailorNet: Predicting Clothing in 3D as a Function of Human Pose, Shape and Garment Style , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30] Cordelia Schmid,et al. Moulding Humans: Non-Parametric 3D Human Shape Estimation From Single Images , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[31] Dragomir Anguelov,et al. SCAPE: shape completion and animation of people , 2005, ACM Trans. Graph..

[32] Thomas Funkhouser,et al. Local Deep Implicit Functions for 3D Shape , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[33] Tsuneya Kurihara,et al. Modeling deformable human hands from medical images , 2004, SCA '04.

[34] Hanbyul Joo,et al. PIFuHD: Multi-Level Pixel-Aligned Implicit Function for High-Resolution 3D Human Digitization , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[35] 拓海杉山,et al. “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .

[36] Francesc Moreno-Noguer,et al. SMPLicit: Topology-aware Generative Model for Clothed People , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[37] Edilson de Aguiar,et al. Stable spaces for real-time clothing , 2010, ACM Trans. Graph..

[38] Gerard Pons-Moll,et al. Implicit Functions in Feature Space for 3D Shape Reconstruction and Completion , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[39] Michael J. Black,et al. SMPL: A Skinned Multi-Person Linear Model , 2023 .

[40] Michael J. Black,et al. STAR: Sparse Trained Articulated Human Body Regressor , 2020, ECCV.

[41] Jitendra Malik,et al. Learning 3D Human Dynamics From Video , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[42] Michael J. Black,et al. LEAP: Learning Articulated Occupancy of People , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[43] Marcus A. Magnor,et al. Learning to Reconstruct People in Clothing From a Single RGB Camera , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[44] Michael J. Black,et al. Learning to Reconstruct 3D Human Pose and Shape via Model-Fitting in the Loop , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[45] Adrian Hilton,et al. A Layered Model of Human Body and Garment Deformation , 2014, 2014 2nd International Conference on 3D Vision.

[46] James F. O'Brien,et al. Near-exhaustive precomputation of secondary cloth effects , 2013, ACM Trans. Graph..

[47] Takaaki Shiratori,et al. DeepHandMesh: A Weakly-supervised Deep Encoder-Decoder Framework for High-fidelity Hand Mesh Modeling , 2020, ECCV.

[48] Qionghai Dai,et al. DoubleFusion: Real-Time Capture of Human Performances with Inner Body Shapes from a Single Depth Sensor , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[49] Cristian Sminchisescu,et al. GHUM & GHUML: Generative 3D Human Shape and Articulated Pose Models , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[50] Kun Zhou,et al. NeuroSkinning: automatic skin binding for production characters with deep graph networks , 2019, ACM Trans. Graph..

[51] Richard A. Newcombe,et al. DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[52] Xiaochen Hu,et al. FACSIMILE: Fast and Accurate Scans From an Image in Less Than a Second , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[53] Tao Yu,et al. DeepHuman: 3D Human Reconstruction From a Single Image , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[54] Cordelia Schmid,et al. BodyNet: Volumetric Inference of 3D Human Body Shapes , 2018, ECCV.

[55] Dinesh K. Pai,et al. EigenSkin: real time large deformation character skinning in hardware , 2002, SCA '02.

[56] Michael J. Black,et al. Generating 3D faces using Convolutional Mesh Autoencoders , 2018, ECCV.

[57] Peter V. Gehler,et al. Unite the People: Closing the Loop Between 3D and 2D Human Representations , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[58] Tao Yu,et al. NormalGAN: Learning Detailed 3D Human from a Single RGB-D Image , 2020, ECCV.

[59] Sebastian Nowozin,et al. Occupancy Networks: Learning 3D Reconstruction in Function Space , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[60] Andreas Geiger,et al. Texture Fields: Learning Texture Representations in Function Space , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[61] Pratul P. Srinivasan,et al. NeRF , 2020, ECCV.

[62] Doug L. James,et al. Skinning mesh animations , 2005, ACM Trans. Graph..

[63] Andrea Tagliasacchi,et al. NASA: Neural Articulated Shape Approximation , 2020, ECCV.

[64] Jinlong Yang,et al. Analyzing Clothing Layer Deformation Statistics of 3D Human Motions , 2018, ECCV.

[65] Christian Theobalt,et al. Multi-Garment Net: Learning to Dress 3D People From Images , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[66] Thomas A. Funkhouser,et al. Learning Shape Templates With Structured Implicit Functions , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[67] Hao Li,et al. PIFu: Pixel-Aligned Implicit Function for High-Resolution Clothed Human Digitization , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[68] E. Kalogerakis,et al. RigNet , 2020, ACM Trans. Graph..

[69] Adam Finkelstein,et al. PairedCycleGAN: Asymmetric Style Transfer for Applying and Removing Makeup , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[70] Michael J. Black,et al. Learning to Dress 3D People in Generative Clothing , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[71] Bharat Lal Bhatnagar,et al. Combining Implicit Function Learning and Parametric Models for 3D Human Reconstruction , 2020, ECCV.

[72] Hao Zhang,et al. Learning Implicit Fields for Generative Shape Modeling , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[73] Michael J. Black,et al. SCALE: Modeling Clothed Humans with a Surface Codec of Articulated Local Elements , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[74] Ruigang Yang,et al. Detailed Human Shape Estimation From a Single Image by Hierarchical Mesh Deformation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).