Learning Compositional Radiance Fields of Dynamic Human Heads

Photorealistic rendering of dynamic humans is an important ability for telepresence systems, virtual shopping, synthetic data generation, and more. Recently, neural rendering methods, which combine techniques from computer graphics and machine learning, have created high-fidelity models of humans and objects. Some of these methods do not produce results with high-enough fidelity for driveable human models (Neural Volumes) whereas others have extremely long rendering times (NeRF). We propose a novel compositional 3D representation that combines the best of previous methods to produce both higher-resolution and faster results. Our representation bridges the gap between discrete and continuous volumetric representations by combining a coarse 3D-structure-aware grid of animation codes with a continuous learned scene function that maps every position and its corresponding local animation code to its view-dependent emitted radiance and local volume density. Differentiable volume rendering is employed to compute photo-realistic novel views of the human head and upper body as well as to train our novel representation end-to-end using only 2D supervision. In addition, we show that the learned dynamic radiance field can be used to synthesize novel unseen expressions based on a global animation code. Our approach achieves state-of-the-art results for synthesizing novel views of dynamic human heads and the upper body.

[1]  Andreas Geiger,et al.  Differentiable Volumetric Rendering: Learning Implicit 3D Representations Without 3D Supervision , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Gordon Wetzstein,et al.  DeepVoxels: Learning Persistent 3D Feature Embeddings , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Victor Lempitsky,et al.  Neural Point-Based Graphics , 2019, ECCV.

[4]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[5]  Jitendra Malik,et al.  Multi-view Supervision for Single-View Reconstruction via Differentiable Ray Consistency , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Leonidas J. Guibas,et al.  PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Richard A. Newcombe,et al.  DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Jitendra Malik,et al.  Learning a Multi-View Stereo Machine , 2017, NIPS.

[9]  Sebastian Nowozin,et al.  Occupancy Networks: Learning 3D Reconstruction in Function Space , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Feng Liu,et al.  Towards High-Fidelity Nonlinear 3D Face Morphable Model , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  M. Zollhöfer,et al.  Self-Supervised Multi-level Face Model Learning for Monocular Reconstruction at Over 250 Hz , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[12]  Michael J. Black,et al.  OpenDR: An Approximate Differentiable Renderer , 2014, ECCV.

[13]  Paul Debevec,et al.  Immersive light field video with a layered mesh representation , 2020, ACM Trans. Graph..

[14]  Richard Szeliski,et al.  SynSin: End-to-End View Synthesis From a Single Image , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Hao Li,et al.  Soft Rasterizer: A Differentiable Renderer for Image-Based 3D Reasoning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[16]  Pratul P. Srinivasan,et al.  NeRF , 2020, ECCV.

[17]  Jiajun Wu,et al.  Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling , 2016, NIPS.

[18]  Marc Pollefeys,et al.  Convolutional Occupancy Networks , 2020, ECCV.

[19]  Yaser Sheikh,et al.  Deep appearance models for face rendering , 2018, ACM Trans. Graph..

[20]  Justus Thies,et al.  Deferred Neural Rendering: Image Synthesis using Neural Textures , 2019 .

[21]  Noah Snavely,et al.  Neural Rerendering in the Wild , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Thomas Funkhouser,et al.  Local Deep Implicit Functions for 3D Shape , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Graham Fyffe,et al.  Stereo Magnification: Learning View Synthesis using Multiplane Images , 2018, ArXiv.

[24]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[25]  Leonidas J. Guibas,et al.  ShapeNet: An Information-Rich 3D Model Repository , 2015, ArXiv.

[26]  Gordon Wetzstein,et al.  Scene Representation Networks: Continuous 3D-Structure-Aware Neural Scene Representations , 2019, NeurIPS.

[27]  Kyaw Zaw Lin,et al.  Neural Sparse Voxel Fields , 2020, NeurIPS.

[28]  Eddy Ilg,et al.  Deep Local Shapes: Learning Local SDF Priors for Detailed 3D Reconstruction , 2020, ECCV.

[29]  Thomas Funkhouser,et al.  Local Implicit Grid Representations for 3D Scenes , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  James Tompkin,et al.  MatryODShka: Real-time 6DoF Video View Synthesis using Multi-Sphere Images , 2020, ECCV.

[31]  Andrew L. Maas Rectifier Nonlinearities Improve Neural Network Acoustic Models , 2013 .

[32]  Ravi Ramamoorthi,et al.  Local Light Field Fusion: Practical View Synthesis with Prescriptive Sampling Guidelines , 2019 .

[33]  Silvio Savarese,et al.  3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction , 2016, ECCV.

[34]  Andreas Geiger,et al.  GRAF: Generative Radiance Fields for 3D-Aware Image Synthesis , 2020, NeurIPS.

[35]  William T. Freeman,et al.  Unsupervised Training for 3D Morphable Model Regression , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[36]  Jaakko Lehtinen,et al.  Learning to Predict 3D Objects with an Interpolation-based Differentiable Renderer , 2019, NeurIPS.

[37]  Tatsuya Harada,et al.  Neural 3D Mesh Renderer , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.