ICON: Implicit Clothed humans Obtained from Normals

Current methods for learning realistic and animatable 3D clothed avatars need either posed 3D scans or 2D images with carefully controlled user poses. In contrast, our goal is to learn an avatar from only 2D images of people in unconstrained poses. Given a set of images, our method estimates a detailed 3D surface from each image and then combines these into an animatable avatar. Implicit functions are well suited to the first task, as they can capture details like hair and clothes. Current methods, however, are not robust to varied human poses and often produce 3D surfaces with broken or disembodied limbs, missing details, or non-human shapes. The problem is that these methods use global feature encoders that are sensitive to global pose. To address this, we propose ICON (“Tmplicit Clothed humans Obtained from Normals”), which, instead, uses local features. ICON has two main modules, both of which exploit the SMPL(-X) body model. First, ICON infers detailed clothed-human normals (front/back) conditioned on the SMPL(-X) normals. Second, a visibility-aware implicit surface regressor produces an iso-surface of a human occupancy field. Importantly, at inference time, a feedback loop alternates between refining the SMPL(-X) mesh using the inferred clothed normals and then refining the normals. Given multiple reconstructed frames of a subject in varied poses, we use a modified version of SCANimate to produce an animatable avatar from them. Evaluation on the AGORA and CAPE datasets shows that ICON outperforms the state of the art in reconstruction, even with heavily limited training data. Additionally, it is much more robust to out-of-distribution samples, e.g., in-the-wild poses/images and out-of-frame cropping. ICON takes a step towards robust 3D clothed human reconstruction from in-the-wild images. This enables avatar creation directly from video with personalized pose-dependent cloth deformation. Models and code are available for research at https://icon.is.tue.mpg.de.

[1]  Michael J. Black,et al.  Human-Aware Object Placement for Visual Environment Reconstruction , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Andreas Geiger,et al.  PINA: Learning a Personalized Implicit Neural Avatar from a Single RGB-D Video Sequence , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Michael J. Black,et al.  gDNA: Towards Generative Detailed Neural Avatars , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Michael J. Black,et al.  Putting People in their Place: Monocular Regression of 3D People in Depth , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Michael J. Black,et al.  The Power of Points for Modeling Humans in Clothing , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[6]  Tony Tung,et al.  Neural-GIF: Neural Generalized Implicit Functions for Animating People in Clothing , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[7]  Stefano Soatto,et al.  ARCH++: Animation-Ready Clothed Human Reconstruction Revisited , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[8]  Andreas Geiger,et al.  MetaAvatar: Learning Animatable Clothed Human Models from Few Depth Images , 2021, NeurIPS.

[9]  Yasamin Jafarian,et al.  Learning High Fidelity Depths of Dressed Humans by Watching Social Media Dance Videos , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Dimitrios Tzionas,et al.  Collaborative Regression of Expressive Bodies using Moderation , 2021, 2021 International Conference on 3D Vision (3DV).

[11]  Qionghai Dai,et al.  DeepMultiCap: Performance Capture of Multiple Characters Using Sparse Multiview Cameras , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[12]  Joachim Tesch,et al.  AGORA: Avatars in Geography Optimized for Regression Analysis , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Michael J. Black,et al.  PARE: Part Attention Regressor for 3D Human Body Estimation , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[14]  Michael J. Black,et al.  SCALE: Modeling Clothed Humans with a Surface Codec of Articulated Local Elements , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Michael J. Black,et al.  SNARF: Differentiable Forward Skinning for Animating Non-Rigid Neural Implicit Shapes , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[16]  Michael J. Black,et al.  SCANimate: Weakly Supervised Learning of Skinned Clothed Avatar Networks , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Zhenan Sun,et al.  PyMAF: 3D Human Pose and Shape Regression with Pyramidal Mesh Alignment Feedback Loop , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[18]  Ersin Yumer,et al.  S3: Neural Shape, Skeleton, and Skinning Fields for 3D Human Modeling , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Justus Thies,et al.  Neural Deformation Graphs for Globally-consistent Non-rigid Reconstruction , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Bharat Lal Bhatnagar,et al.  LoopReg: Self-supervised Learning of Implicit Surface Correspondences, Pose and Shape for 3D Human Mesh Registration , 2020, NeurIPS.

[21]  J. Hodgins,et al.  MonoClothCap: Towards Temporally Coherent Clothing Capture from Monocular RGB Video , 2020, 2020 International Conference on 3D Vision (3DV).

[22]  Hao Li,et al.  Volumetric human teleportation , 2020, SIGGRAPH 2020.

[23]  Hao Li,et al.  Monocular Real-Time Volumetric Performance Capture , 2020, ECCV.

[24]  Bharat Lal Bhatnagar,et al.  Combining Implicit Function Learning and Parametric Models for 3D Human Reconstruction , 2020, ECCV.

[25]  Tao Yu,et al.  PaMIR: Parametric Model-Conditioned Implicit Representation for Image-Based Human Reconstruction , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Stefano Soatto,et al.  Geo-PIFu: Geometry and Pixel Aligned Implicit Functions for Single-view Human Reconstruction , 2020, NeurIPS.

[27]  Cristian Sminchisescu,et al.  GHUM & GHUML: Generative 3D Human Shape and Articulated Pose Models , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Hao Li,et al.  ARCH: Animatable Reconstruction of Clothed Humans , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Tao Yu,et al.  Robust 3D Self-Portraits in Seconds , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Hanbyul Joo,et al.  PIFuHD: Multi-Level Pixel-Aligned Implicit Function for High-Resolution 3D Human Digitization , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Hujun Bao,et al.  BCNet: Learning Body and Cloth Shape from A Single Image , 2020, ECCV.

[32]  Michael J. Black,et al.  VIBE: Video Inference for Human Body Pose and Shape Estimation , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  M. Madadi,et al.  CLOTH3D: Clothed 3D Humans , 2019, ECCV.

[34]  Wan-Yen Lo,et al.  Accelerating 3D deep learning with PyTorch3D , 2019, SIGGRAPH Asia 2020 Courses.

[35]  P. Tan,et al.  A Neural Network for Detailed Human Depth Estimation From a Single Image , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[36]  Michael J. Black,et al.  Learning to Reconstruct 3D Human Pose and Shape via Model-Fitting in the Loop , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[37]  Xiaochen Hu,et al.  FACSIMILE: Fast and Accurate Scans From an Image in Less Than a Second , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[38]  Gerard Pons-Moll,et al.  360-Degree Textures of People in Clothing from a Single Image , 2019, 2019 International Conference on 3D Vision (3DV).

[39]  Bharat Lal Bhatnagar,et al.  Multi-Garment Net: Learning to Dress 3D People From Images , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[40]  Michael J. Black,et al.  Learning to Dress 3D People in Generative Clothing , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Hao Li,et al.  PIFu: Pixel-Aligned Implicit Function for High-Resolution Clothed Human Digitization , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[42]  Ruigang Yang,et al.  Detailed Human Shape Estimation From a Single Image by Hierarchical Mesh Deformation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  C. Theobalt,et al.  Tex2Shape: Detailed Full Human Body Geometry From a Single Image , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[44]  Dimitrios Tzionas,et al.  Expressive Body Capture: 3D Hands, Face, and Body From a Single Image , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Tao Yu,et al.  DeepHuman: 3D Human Reconstruction From a Single Image , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[46]  Marcus A. Magnor,et al.  Learning to Reconstruct People in Clothing From a Single RGB Camera , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Richard A. Newcombe,et al.  DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Sebastian Nowozin,et al.  Occupancy Networks: Learning 3D Reconstruction in Function Space , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  Hao Zhang,et al.  Learning Implicit Fields for Generative Shape Modeling , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Georgios Tzimiropoulos,et al.  3D Human Body Reconstruction from a Single Image via Volumetric Regression , 2018, ECCV Workshops.

[51]  Marcus A. Magnor,et al.  Detailed Human Avatars from Monocular Video , 2018, 2018 International Conference on 3D Vision (3DV).

[52]  Marcus A. Magnor,et al.  Video Based Reconstruction of 3D People Models , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[53]  Yaser Sheikh,et al.  Total Capture: A 3D Deformation Model for Tracking Faces, Hands, and Bodies , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[54]  Jitendra Malik,et al.  End-to-End Recovery of Human Shape and Pose , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[55]  Michael J. Black,et al.  ClothCap , 2017, ACM Trans. Graph..

[56]  Michael J. Black,et al.  Detailed, Accurate, Human Shape Estimation from Clothed 3D Scan Sequences , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[57]  Peter V. Gehler,et al.  Keep It SMPL: Automatic Estimation of 3D Human Pose and Shape from a Single Image , 2016, ECCV.

[58]  Li Fei-Fei,et al.  Perceptual Losses for Real-Time Style Transfer and Super-Resolution , 2016, ECCV.

[59]  Jia Deng,et al.  Stacked Hourglass Networks for Human Pose Estimation , 2016, ECCV.

[60]  Michael J. Black,et al.  SMPL: A Skinned Multi-Person Linear Model , 2023 .

[61]  William E. Lorensen,et al.  Marching cubes: A high resolution 3D surface construction algorithm , 1987, SIGGRAPH.

[62]  Dimitrios Tzionas,et al.  Embodied hands , 2017, ACM Trans. Graph..

[63]  M. Madadi,et al.  Learning Cloth Dynamics: 3D+Texture Garment Reconstruction Benchmark , 2020, NeurIPS.