Efficient 3D Articulated Human Generation with Layered Surface Volumes

Access to high-quality and diverse 3D articulated digital human assets is crucial in various applications, ranging from virtual reality to social platforms. Generative approaches, such as 3D generative adversarial networks (GANs), are rapidly replacing laborious manual content creation tools. However, existing 3D GAN frameworks typically rely on scene representations that leverage either template meshes, which are fast but offer limited quality, or volumes, which offer high capacity but are slow to render, thereby limiting the 3D fidelity in GAN settings. In this work, we introduce layered surface volumes (LSVs) as a new 3D object representation for articulated digital humans. LSVs represent a human body using multiple textured mesh layers around a conventional template. These layers are rendered using alpha compositing with fast differentiable rasterization, and they can be interpreted as a volumetric representation that allocates its capacity to a manifold of finite thickness around the template. Unlike conventional single-layer templates that struggle with representing fine off-surface details like hair or accessories, our surface volumes naturally capture such details. LSVs can be articulated, and they exhibit exceptional efficiency in GAN settings, where a 2D generator learns to synthesize the RGBA textures for the individual layers. Trained on unstructured, single-view 2D image datasets, our LSV-GAN generates high-quality and view-consistent 3D articulated digital humans without the need for view-inconsistent 2D upsampling networks.

[1]  J. P. Lewis,et al.  Pose Space Deformation: A Unified Approach to Shape Interpolation and Skeleton-Driven Deformation , 2023 .

[2]  Kwan-Yee Kenneth Wong,et al.  DreamAvatar: Text-and-Shape Guided 3D Human Avatar Generation via Diffusion Models , 2023, ArXiv.

[3]  Lan Xu,et al.  DreamFace: Progressive Generation of Animatable 3D Faces under Text Guidance , 2023, ACM Trans. Graph..

[4]  Hongyi Xu,et al.  PanoHead: Geometry-Aware 3D Full-Head Synthesis in 360° , 2023, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Bolei Zhou,et al.  DisCoScene: Spatially Disentangled Generative Radiance Fields for Controllable 3D-aware Scene Synthesis , 2022, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Bo Dai,et al.  3DHumanGAN: Towards Photo-Realistic 3D-Aware Human Image Generation , 2022, ArXiv.

[7]  Wenzheng Chen,et al.  HumanGen: Generating Human Radiance Fields with Explicit Priors , 2022, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  M. Nießner,et al.  ClipFace: Text-guided Editing of Textured 3D Morphable Models , 2022, SIGGRAPH.

[9]  Raymond A. Yeh,et al.  Score Jacobian Chaining: Lifting Pretrained 2D Diffusion Models for 3D Generation , 2022, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Jeong Joon Park,et al.  SinGRAF: Learning a 3D Generative Radiance Field for a Single Scene , 2022, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Jiajun Wu,et al.  3D Neural Field Generation Using Triplane Diffusion , 2022, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Hongwen Zhang,et al.  Next3D: Generative Neural Texture Rasterization for 3D-Aware Head Avatars , 2022, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Xun Huang,et al.  Magic3D: High-Resolution Text-to-3D Content Creation , 2022, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Liang Pan,et al.  EVA3D: Compositional 3D Human Generation from 2D Image Collections , 2022, ICLR.

[15]  Ben Poole,et al.  DreamFusion: Text-to-3D using 2D Diffusion , 2022, ICLR.

[16]  S. Fidler,et al.  GET3D: A Generative Model of High Quality 3D Textured Shapes Learned from Images , 2022, NeurIPS.

[17]  Dingdong Yang,et al.  AvatarGen: a 3D Generative Model for Animatable Human Avatars , 2022, ECCV Workshops.

[18]  Walter A. Talbott,et al.  GAUDI: A Neural Architect for Immersive 3D Scene Generation , 2022, NeurIPS.

[19]  A. Schwing,et al.  Generative Multiplane Images: Making a 2D GAN 3D-Aware , 2022, ECCV.

[20]  David B. Lindell,et al.  Generative Neural Articulated Radiance Fields , 2022, NeurIPS.

[21]  Peter Wonka,et al.  EpiGRAF: Rethinking training of 3D GANs , 2022, NeurIPS.

[22]  Xin Tong,et al.  GRAM-HD: 3D-Consistent Image Generation at High Resolution with Generative Radiance Manifolds , 2022, ArXiv.

[23]  Andreas Geiger,et al.  VoxGRAF: Fast 3D-Aware Image Synthesis with Sparse Voxel Grids , 2022, NeurIPS.

[24]  Zhongang Cai,et al.  AvatarCLIP , 2022, ACM Trans. Graph..

[25]  Chen Change Loy,et al.  StyleGAN-Human: A Data-Centric Odyssey of Human Generation , 2022, ECCV.

[26]  Zhedong Zheng,et al.  Multi-View Consistent Generative Adversarial Networks for 3D-aware Image Synthesis , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  C. Theobalt,et al.  Disentangled3D: Learning a 3D Generative Model with Disentangled Geometry and Appearance from Monocular Images , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Yong Jae Lee,et al.  GIRAFFE HD: A High-Resolution 3D-aware Generative Model , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Krishna Kumar Singh,et al.  InsetGAN for Full-Body Image Generation , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Qifeng Chen,et al.  3D-Aware Indoor Scene Synthesis with Depth Priors , 2022, ECCV.

[31]  Jeong Joon Park,et al.  StyleSDF: High-Resolution 3D-Consistent Image and Geometry Generation , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Bolei Zhou,et al.  3D-aware Image Synthesis via Learning Structural and Textural Representations , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Xin Tong,et al.  GRAM: Generative Radiance Manifolds for 3D-Aware Image Generation , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Shalini De Mello,et al.  Efficient Geometry-aware 3D Generative Adversarial Networks , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  N. Sebe,et al.  3D-Aware Semantic-Guided Generative Model for Human Synthesis , 2021, ECCV.

[36]  Yebin Liu,et al.  FENeRF: Face Editing in Neural Radiance Fields , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Christian Theobalt,et al.  A Shading-Guided Generative Implicit Model for Shape-Accurate 3D-Aware Image Synthesis , 2021, NeurIPS.

[38]  Bingbing Ni,et al.  CIPS-3D: A 3D-Aware Generator of GANs Based on Conditionally-Independent Pixel Synthesis , 2021, ArXiv.

[39]  Christian Theobalt,et al.  StyleNeRF: A Style-based 3D-Aware Generator for High-resolution Image Synthesis , 2021, ICLR.

[40]  J.-Y. Zhu,et al.  Advances in Neural Rendering , 2021, SIGGRAPH Courses.

[41]  Jaakko Lehtinen,et al.  Alias-Free Generative Adversarial Networks , 2021, NeurIPS.

[42]  M. Zollhöfer,et al.  Pulsar: Efficient Sphere-based Neural Rendering , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Victor Lempitsky,et al.  StylePeople: A Generative Model of Fullbody Human Avatars , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Ming-Yu Liu,et al.  GANcraft: Unsupervised 3D Neural Rendering of Minecraft Worlds , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[45]  Nitish Srivastava,et al.  Unconstrained Scene Generation with Locally Conditioned Radiance Fields , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[46]  Jiajun Wu,et al.  pi-GAN: Periodic Implicit Generative Adversarial Networks for 3D-Aware Image Synthesis , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Andreas Geiger,et al.  GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Jaakko Lehtinen,et al.  Modular primitives for high-performance differentiable rendering , 2020, ACM Trans. Graph..

[49]  Jingyi Yu,et al.  SofGAN: A Portrait Image Generator with Dynamic Styling , 2020, ACM Trans. Graph..

[50]  Andreas Geiger,et al.  GRAF: Generative Radiance Fields for 3D-Aware Image Synthesis , 2020, NeurIPS.

[51]  Noah Snavely,et al.  Single-View View Synthesis With Multiplane Images , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  Yong-Liang Yang,et al.  BlockGAN: Learning 3D Object-aware Scene Representations from Unlabelled Images , 2020, NeurIPS.

[53]  Andreas Geiger,et al.  Towards Unsupervised Learning of Generative Models for 3D Controllable Image Synthesis , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[54]  Tero Karras,et al.  Analyzing and Improving the Image Quality of StyleGAN , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[55]  Paolo Favaro,et al.  Unsupervised Generative 3D Shape Learning from Natural Images , 2019, ArXiv.

[56]  R. Sarpong,et al.  Bio-inspired synthesis of xishacorenes A, B, and C, and a new congener from fuscol† †Electronic supplementary information (ESI) available. See DOI: 10.1039/c9sc02572c , 2019, Chemical science.

[57]  Dimitrios Tzionas,et al.  Expressive Body Capture: 3D Hands, Face, and Body From a Single Image , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[58]  Yong-Liang Yang,et al.  HoloGAN: Unsupervised Learning of 3D Representations From Natural Images , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[59]  Timo Aila,et al.  A Style-Based Generator Architecture for Generative Adversarial Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[60]  Jiajun Wu,et al.  Visual Object Networks: Image Generation with Disentangled 3D Representations , 2018, NeurIPS.

[61]  N. Mitra,et al.  Escaping Plato’s Cave: 3D Shape From Adversarial Rendering , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[62]  John Flynn,et al.  Stereo magnification , 2018, ACM Trans. Graph..

[63]  Jaakko Lehtinen,et al.  Progressive Growing of GANs for Improved Quality, Stability, and Variation , 2017, ICLR.

[64]  Subhransu Maji,et al.  3D Shape Induction from 2D Views of Multiple Objects , 2016, 2017 International Conference on 3D Vision (3DV).

[65]  Jiajun Wu,et al.  Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling , 2016, NIPS.

[66]  Xiaogang Wang,et al.  DeepFashion: Powering Robust Clothes Recognition and Retrieval with Rich Annotations , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[67]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[68]  Michael J. Black,et al.  SMPL: A Skinned Multi-Person Linear Model , 2023 .

[69]  Bernt Schiele,et al.  2D Human Pose Estimation: New Benchmark and State of the Art Analysis , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[70]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[71]  John P. Lewis,et al.  Pose Space Deformation: A Unified Approach to Shape Interpolation and Skeleton-Driven Deformation , 2000, SIGGRAPH.

[72]  Stephen Lin,et al.  Unsupervised Learning of Efficient Geometry-Aware Neural Articulated Representations , 2022, ECCV.

[73]  David A. Ross,et al.  Learn to Dance with AIST++: Music Conditioned 3D Dance Generation , 2021, ArXiv.