ECON: Explicit Clothed humans Optimized via Normal integration

The combination of deep learning, artist-curated scans, and Implicit Functions (IF), is enabling the creation of detailed, clothed, 3D humans from images. However, existing methods are far from perfect. IF-based methods recover free-form geometry, but produce disembodied limbs or degenerate shapes for novel poses or clothes. To increase robustness for these cases, existing work uses an explicit parametric body model to constrain surface reconstruction, but this limits the recovery of free-form surfaces such as loose clothing that deviates from the body. What we want is a method that combines the best properties of implicit representation and explicit body regularization. To this end, we make two key observations: (1) current networks are better at inferring detailed 2D maps than full-3D surfaces, and (2) a parametric model can be seen as a"canvas"for stitching together detailed surface patches. Based on these, our method, ECON, has three main steps: (1) It infers detailed 2D normal maps for the front and back side of a clothed person. (2) From these, it recovers 2.5D front and back surfaces, called d-BiNI, that are equally detailed, yet incomplete, and registers these w.r.t. each other with the help of a SMPL-X body mesh recovered from the image. (3) It"inpaints"the missing geometry between d-BiNI surfaces. If the face and hands are noisy, they can optionally be replaced with the ones of SMPL-X. As a result, ECON infers high-fidelity 3D humans even in loose clothes and challenging poses. This goes beyond previous methods, according to the quantitative evaluation on the CAPE and Renderpeople datasets. Perceptual studies also show that ECON's perceived realism is better by a large margin. Code and models are available for research purposes at econ.is.tue.mpg.de

[1]  Dave Zhenyu Chen,et al.  Text2Tex: Text-driven Texture Synthesis via Diffusion Models , 2023, 2023 IEEE/CVF International Conference on Computer Vision (ICCV).

[2]  Xiaoguang Han,et al.  HairStep: Transfer Synthetic to Real Using Strand and Depth Maps for Single-View 3D Hair Modeling , 2023, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  J. Malik,et al.  Decoupling Human and Camera Motion from Videos in the Wild , 2023, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Jie Song,et al.  Vid2Avatar: 3D Avatar Reconstruction from Videos in the Wild via Self-supervised Scene Decomposition , 2023, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Maneesh Agrawala,et al.  Adding Conditional Control to Text-to-Image Diffusion Models , 2023, ArXiv.

[6]  R. Giryes,et al.  TEXTure: Text-Guided Texturing of 3D Shapes , 2023, SIGGRAPH.

[7]  Xiaoguang Han,et al.  Get3DHuman: Lifting StyleGAN-Human into a 3D Generative Model using Pixel-aligned Reconstruction Priors , 2023, ArXiv.

[8]  Hongwen Zhang,et al.  PyMAF-X: Towards Well-Aligned Full-Body Model Regression From Monocular Images , 2022, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Michael J. Black,et al.  HOOD: Hierarchical Graphs for Generalized Modelling of Clothing Dynamics , 2022, arXiv.org.

[10]  Wenzheng Chen,et al.  HumanGen: Generating Human Radiance Fields with Explicit Priors , 2022, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Michael J. Black,et al.  Generating Holistic 3D Human Motion from Speech , 2022, ArXiv.

[12]  M. Budagavi,et al.  Layered-Garment Net: Generating Multiple Implicit Garment Layers from a Single Image , 2022, ACCV.

[13]  Cewu Lu,et al.  DART: Articulated Hand Model with Diverse Accessories and Rich Textures , 2022, NeurIPS.

[14]  Liang Pan,et al.  EVA3D: Compositional 3D Human Generation from 2D Image Collections , 2022, ICLR.

[15]  Michael J. Black,et al.  Capturing and Animation of Body and Clothing from Monocular Video , 2022, SIGGRAPH Asia.

[16]  S. Fidler,et al.  GET3D: A Generative Model of High Quality 3D Textured Shapes Learned from Images , 2022, NeurIPS.

[17]  P. Fua,et al.  DIG: Draping Implicit Garment over the Human Body , 2022, ACCV.

[18]  M. Pollefeys,et al.  3D Textured Shape Recovery with Learned Geometric Priors , 2022, ArXiv.

[19]  Bharat Lal Bhatnagar,et al.  Any-Shot GIN: Generalizing Implicit Networks for Reconstructing Novel Classes , 2022, 2022 International Conference on 3D Vision (3DV).

[20]  Dingdong Yang,et al.  AvatarGen: a 3D Generative Model for Animatable Human Avatars , 2022, ECCV Workshops.

[21]  Kyoung Mu Lee,et al.  3D Clothed Human Reconstruction in the Wild , 2022, ECCV.

[22]  Chen Change Loy,et al.  StyleGAN-Human: A Data-Centric Odyssey of Human Generation , 2022, ECCV.

[23]  Stephen Lin,et al.  Unsupervised Learning of Efficient Geometry-Aware Neural Articulated Representations , 2022, ECCV.

[24]  Xiaoguang Han,et al.  Registering Explicit to Implicit: Towards High-Fidelity Garment mesh Reconstruction from Single Images , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Andreas Geiger,et al.  PINA: Learning a Personalized Implicit Neural Avatar from a Single RGB-D Video Sequence , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  H. Bao,et al.  SelfRecon: Self Reconstruction Your Digital Avatar from Monocular Video , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Michael J. Black,et al.  gDNA: Towards Generative Detailed Neural Avatars , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  B. Ommer,et al.  High-Resolution Image Synthesis with Latent Diffusion Models , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Michael J. Black,et al.  ICON: Implicit Clothed humans Obtained from Normals , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Michael J. Black,et al.  Putting People in their Place: Monocular Regression of 3D People in Depth , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Tao Yu,et al.  PaMIR: Parametric Model-Conditioned Implicit Representation for Image-Based Human Reconstruction , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Y. Matsushita,et al.  Bilateral Normal Integration , 2022, ECCV.

[33]  Dimitrios Tzionas,et al.  Embodied Hands: Modeling and Capturing Hands and Bodies Together , 2022, ArXiv.

[34]  Brinnae Bent,et al.  InfiniteForm: A synthetic, minimal bias dataset for fitness applications , 2021, ArXiv.

[35]  Stefano Soatto,et al.  ARCH++: Animation-Ready Clothed Human Reconstruction Revisited , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[36]  Yasamin Jafarian,et al.  Learning High Fidelity Depths of Dressed Humans by Watching Social Media Dance Videos , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Dimitrios Tzionas,et al.  Collaborative Regression of Expressive Bodies using Moderation , 2021, 2021 International Conference on 3D Vision (3DV).

[38]  Olga Sorkine-Hornung,et al.  Learning skeletal articulations with neural blend shapes , 2021, ACM Trans. Graph..

[39]  Tao Yu,et al.  Function4D: Real-time Human Volumetric Capture from Very Sparse Consumer RGBD Sensors , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Qionghai Dai,et al.  DeepMultiCap: Performance Capture of Multiple Characters Using Sparse Multiview Cameras , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[41]  Joachim Tesch,et al.  AGORA: Avatars in Geography Optimized for Regression Analysis , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Michael J. Black,et al.  PARE: Part Attention Regressor for 3D Human Body Estimation , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[43]  Victor Lempitsky,et al.  StylePeople: A Generative Model of Fullbody Human Avatars , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Kirill Mazur,et al.  Point-Based Modeling of Human Clothing , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[45]  Zhenan Sun,et al.  PyMAF: 3D Human Pose and Shape Regression with Pyramidal Mesh Alignment Feedback Loop , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[46]  Francesc Moreno-Noguer,et al.  SMPLicit: Topology-aware Generative Model for Clothed People , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Ersin Yumer,et al.  S3: Neural Shape, Skeleton, and Skinning Fields for 3D Human Modeling , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Cewu Lu,et al.  HybrIK: A Hybrid Analytical-Neural Inverse Kinematics Solution for 3D Human Pose and Shape Estimation , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  Michael J. Black,et al.  Monocular, One-stage, Regression of Multiple 3D People , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[50]  Gerard Pons-Moll,et al.  Neural Unsigned Distance Fields for Implicit Function Learning , 2020, NeurIPS.

[51]  J. Hodgins,et al.  MonoClothCap: Towards Temporally Coherent Clothing Capture from Monocular RGB Video , 2020, 2020 International Conference on 3D Vision (3DV).

[52]  Hao Li,et al.  Volumetric human teleportation , 2020, SIGGRAPH 2020.

[53]  Hao Li,et al.  Monocular Real-Time Volumetric Performance Capture , 2020, ECCV.

[54]  Stefano Soatto,et al.  Geo-PIFu: Geometry and Pixel Aligned Implicit Functions for Single-view Human Reconstruction , 2020, NeurIPS.

[55]  Cristian Sminchisescu,et al.  GHUM & GHUML: Generative 3D Human Shape and Articulated Pose Models , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[56]  E. Kalogerakis,et al.  RigNet , 2020, ACM Trans. Graph..

[57]  Hao Li,et al.  ARCH: Animatable Reconstruction of Clothed Humans , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[58]  Tao Yu,et al.  Robust 3D Self-Portraits in Seconds , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[59]  Hanbyul Joo,et al.  PIFuHD: Multi-Level Pixel-Aligned Implicit Function for High-Resolution 3D Human Digitization , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[60]  Hujun Bao,et al.  BCNet: Learning Body and Cloth Shape from A Single Image , 2020, ECCV.

[61]  Gerard Pons-Moll,et al.  Implicit Functions in Feature Space for 3D Shape Reconstruction and Completion , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[62]  Wan-Yen Lo,et al.  Accelerating 3D deep learning with PyTorch3D , 2019, SIGGRAPH Asia 2020 Courses.

[63]  Michael J. Black,et al.  Learning to Dress 3D People in Generative Clothing , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[64]  Yi Yang,et al.  Random Erasing Data Augmentation , 2017, AAAI.

[65]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[66]  Michael J. Black,et al.  Learning to Reconstruct 3D Human Pose and Shape via Model-Fitting in the Loop , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[67]  Xiaochen Hu,et al.  FACSIMILE: Fast and Accurate Scans From an Image in Less Than a Second , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[68]  Gerard Pons-Moll,et al.  360-Degree Textures of People in Clothing from a Single Image , 2019, 2019 International Conference on 3D Vision (3DV).

[69]  Christian Theobalt,et al.  Multi-Garment Net: Learning to Dress 3D People From Images , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[70]  Cordelia Schmid,et al.  Moulding Humans: Non-Parametric 3D Human Shape Estimation From Single Images , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[71]  Fan Zhang,et al.  MediaPipe: A Framework for Building Perception Pipelines , 2019, ArXiv.

[72]  Hao Li,et al.  PIFu: Pixel-Aligned Implicit Function for High-Resolution Clothed Human Digitization , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[73]  Ruigang Yang,et al.  Detailed Human Shape Estimation From a Single Image by Hierarchical Mesh Deformation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[74]  Marcus A. Magnor,et al.  Tex2Shape: Detailed Full Human Body Geometry From a Single Image , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[75]  Dimitrios Tzionas,et al.  Expressive Body Capture: 3D Hands, Face, and Body From a Single Image , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[76]  Tao Yu,et al.  DeepHuman: 3D Human Reconstruction From a Single Image , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[77]  Marcus A. Magnor,et al.  Learning to Reconstruct People in Clothing From a Single RGB Camera , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[78]  Ruimao Zhang,et al.  DeepFashion2: A Versatile Benchmark for Detection, Pose Estimation, Segmentation and Re-Identification of Clothing Images , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[79]  Richard A. Newcombe,et al.  DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[80]  Sebastian Nowozin,et al.  Occupancy Networks: Learning 3D Reconstruction in Function Space , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[81]  Yi Wang,et al.  Image Inpainting via Generative Multi-column Convolutional Neural Networks , 2018, NeurIPS.

[82]  Marcus A. Magnor,et al.  Video Based Reconstruction of 3D People Models , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[83]  Yaser Sheikh,et al.  Total Capture: A 3D Deformation Model for Tracking Faces, Hands, and Bodies , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[84]  Jitendra Malik,et al.  End-to-End Recovery of Human Shape and Pose , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[85]  Jean-Denis Durou,et al.  Normal Integration: A Survey , 2017, Journal of Mathematical Imaging and Vision.

[86]  Michael J. Black,et al.  ClothCap: seamless 4D clothing capture and retargeting , 2017, ACM Trans. Graph..

[87]  Xiaogang Wang,et al.  DeepFashion: Powering Robust Clothes Recognition and Retrieval with Rich Annotations , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[88]  Michael J. Black,et al.  SMPL: A Skinned Multi-Person Linear Model , 2023 .

[89]  Michael M. Kazhdan,et al.  Screened poisson surface reconstruction , 2013, TOGS.

[90]  Michael M. Kazhdan,et al.  Poisson surface reconstruction , 2006, SGP '06.

[91]  William E. Lorensen,et al.  Marching cubes: A high resolution 3D surface construction algorithm , 1987, SIGGRAPH.