Neural Point-based Volumetric Avatar: Surface-guided Neural Points for Efficient and Photorealistic Volumetric Head Avatar

Rendering photorealistic and dynamically moving human heads is crucial for ensuring a pleasant and immersive experience in AR/VR and video conferencing applications. However, existing methods often struggle to model challenging facial regions (e.g., mouth interior, eyes, hair/beard), resulting in unrealistic and blurry results. In this paper, we propose {\fullname} ({\name}), a method that adopts the neural point representation as well as the neural volume rendering process and discards the predefined connectivity and hard correspondence imposed by mesh-based approaches. Specifically, the neural points are strategically constrained around the surface of the target expression via a high-resolution UV displacement map, achieving increased modeling capacity and more accurate control. We introduce three technical innovations to improve the rendering and training efficiency: a patch-wise depth-guided (shading point) sampling strategy, a lightweight radiance decoding process, and a Grid-Error-Patch (GEP) ray sampling strategy during training. By design, our {\name} is better equipped to handle topologically changing regions and thin structures while also ensuring accurate expression control when animating avatars. Experiments conducted on three subjects from the Multiface dataset demonstrate the effectiveness of our designs, outperforming previous state-of-the-art methods, especially in handling challenging facial regions.

[1]  Yinda Zhang,et al.  Learning Personalized High Quality Volumetric Head Avatars from Monocular RGB Videos , 2023, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Michael J. Black,et al.  PointAvatar: Deformable Point-Based Head Avatars from Videos , 2022, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Jason M. Saragih,et al.  Multiface: A Dataset for Neural Face Rendering , 2022, ArXiv.

[4]  V. Lempitsky,et al.  Realistic One-shot Mesh-based Head Avatars , 2022, ECCV.

[5]  P. Spurek,et al.  Points2NeRF: Generating Neural Radiance Fields from 3D point cloud , 2022, Pattern Recognit. Lett..

[6]  V. Lempitsky,et al.  NPBG++: Accelerating Neural Point-Based Graphics , 2022, ArXiv.

[7]  U. Neumann,et al.  Point-NeRF: Point-based Neural Radiance Fields , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Shunyu Yao,et al.  DFA-NeRF: Personalized Talking Head Generation via Disentangled Face Attributes Neural Rendering , 2022, ArXiv.

[9]  H. Bao,et al.  Efficient Neural Radiance Fields for Interactive Free-viewpoint Video , 2021, SIGGRAPH Asia.

[10]  C. Rother,et al.  Neural Head Avatars from Monocular RGB Videos , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  M. Stamminger,et al.  ADOP , 2021, ACM Trans. Graph..

[12]  C. Theobalt,et al.  NeuS: Learning Neural Implicit Surfaces by Volume Rendering for Multi-view Reconstruction , 2021, NeurIPS.

[13]  Koki Nagano,et al.  Normalized Avatar Synthesis Using StyleGAN and Perceptual Refinement , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  M. Zollhöfer,et al.  Pulsar: Efficient Sphere-based Neural Rendering , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Gordon Wetzstein,et al.  Acorn , 2021, ACM Trans. Graph..

[16]  Jason M. Saragih,et al.  Pixel Codec Avatars , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Edgar Sucar,et al.  iMAP: Implicit Mapping and Positioning in Real-Time , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[18]  Gordon Wetzstein,et al.  Neural Lumigraph Rendering , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Yaser Sheikh,et al.  Mixture of volumetric primitives for efficient neural rendering , 2021, ACM Transactions on Graphics.

[20]  Justus Thies,et al.  Dynamic Neural Radiance Fields for Monocular 4D Facial Avatar Reconstruction , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Jaakko Lehtinen,et al.  Modular primitives for high-performance differentiable rendering , 2020, ACM Trans. Graph..

[22]  M. Zollhöfer,et al.  PatchNets: Patch-Based Generalizable Deep Implicit 3D Shape Representations , 2020, ECCV.

[23]  Pratul P. Srinivasan,et al.  NeRF , 2020, ECCV.

[24]  C. Rudin,et al.  PULSE: Self-Supervised Photo Upsampling via Latent Space Exploration of Generative Models , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Matthias Zwicker,et al.  SDFDiff: Differentiable Rendering of Signed Distance Fields for 3D Shape Optimization , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Zhenyi He,et al.  CollaboVR: A Reconfigurable Framework for Multi-user to Communicate in Virtual Reality , 2019 .

[27]  Yinda Zhang,et al.  DIST: Rendering Deep Implicit Signed Distance Function With Differentiable Sphere Tracing , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  T. Vetter,et al.  3D Morphable Face Models—Past, Present, and Future , 2019, ACM Trans. Graph..

[29]  Victor Lempitsky,et al.  Neural Point-Based Graphics , 2019, ECCV.

[30]  Gordon Wetzstein,et al.  Scene Representation Networks: Continuous 3D-Structure-Aware Neural Scene Representations , 2019, NeurIPS.

[31]  Justus Thies,et al.  Deferred neural rendering , 2019, ACM Trans. Graph..

[32]  Hao Li,et al.  Soft Rasterizer: A Differentiable Renderer for Image-Based 3D Reasoning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[33]  Richard A. Newcombe,et al.  DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Sebastian Nowozin,et al.  Occupancy Networks: Learning 3D Reconstruction in Function Space , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Hao Zhang,et al.  Learning Implicit Fields for Generative Shape Modeling , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Jason M. Saragih,et al.  Deep appearance models for face rendering , 2018, ACM Trans. Graph..

[37]  Patrick Pérez,et al.  State of the Art on Monocular 3D Face Reconstruction, Tracking, and Applications , 2018, Comput. Graph. Forum.

[38]  Alexei A. Efros,et al.  The Unreasonable Effectiveness of Deep Features as a Perceptual Metric , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[39]  Michael J. Black,et al.  Learning a model of facial shape and expression from 4D scans , 2017, ACM Trans. Graph..

[40]  Charles T. Loop,et al.  Holoportation: Virtual 3D Teleportation in Real-time , 2016, UIST.

[41]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[42]  P. Ekman The face of man : expressions of universal emotions in a New Guinea village , 1981 .

[43]  Zhen Dong,et al.  Semi-signed neural fitting for surface reconstruction from unoriented point clouds , 2022, ArXiv.

[44]  Tommy A. Noble,et al.  Processing coastal imagery with Agisoft Metashape Professional Edition, version 1.6—Structure from motion workflow documentation , 2021, Open-File Report.

[45]  Mark Billinghurst,et al.  AR/MR Remote Collaboration on Physical Tasks: A Review , 2021, Robotics Comput. Integr. Manuf..

[46]  Michael Undercofler Role-playing games , 2015 .