论文信息 - AniPixel: Towards Animatable Pixel-Aligned Human Avatar

AniPixel: Towards Animatable Pixel-Aligned Human Avatar

Neural radiance field using pixel-aligned features can render photo-realistic novel views. However, when pixel-aligned features are directly introduced to human avatar reconstruction, the rendering can only be conducted for still humans, rather than animatable avatars. In this paper, we propose AniPixel, a novel animatable and generalizable human avatar reconstruction method that leverages pixel-aligned features for body geometry prediction and RGB color blending. Technically, to align the canonical space with the target space and the observation space, we propose a bidirectional neural skinning field based on skeleton-driven deformation to establish the target-to-canonical and canonical-to-observation correspondences. Then, we disentangle the canonical body geometry into a normalized neutral-sized body and a subject-specific residual for better generalizability. As the geometry and appearance are closely related, we introduce pixel-aligned features to facilitate the body geometry prediction and detailed surface normals to reinforce the RGB color blending. Moreover, we devise a pose-dependent and view direction-related shading module to represent the local illumination variance. Experiments show that our AniPixel renders comparable novel views while delivering better novel pose animation results than state-of-the-art methods. The code will be released.

Jing Zhang | Dacheng Tao | Zhi Hou | Jinlong Fan

[1] Aayush Bansal,et al. KeypointNeRF: Generalizing Image-based Volumetric Avatars using Relative Spatial Encoding of Keypoints , 2022, ECCV.

[2] C. Sminchisescu,et al. Photorealistic Monocular 3D Reconstruction of Humans Wearing Clothing , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3] Jiaolong Yang,et al. MPS-NeRF: Generalizable 3D Human Rendering from Multiview Images , 2022, IEEE transactions on pattern analysis and machine intelligence.

[4] T. Müller,et al. Instant neural graphics primitives with a multiresolution hash encoding , 2022, ACM Trans. Graph..

[5] Pratul P. Srinivasan,et al. HumanNeRF: Free-viewpoint Rendering of Moving People from Monocular Video , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6] Jiakai Zhang,et al. HumanNeRF: Efficiently Generated Human Radiance Field from Sparse Inputs , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7] Henry Fuchs,et al. Neural Human Performer: Learning Generalizable Radiance Fields for Human Performance Rendering , 2021, NeurIPS.

[8] Stefano Soatto,et al. ARCH++: Animation-Ready Clothed Human Reconstruction Revisited , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[9] Yaron Lipman,et al. Volume Rendering of Neural Implicit Surfaces , 2021, NeurIPS.

[10] Christian Theobalt,et al. Neural actor , 2021, ACM Trans. Graph..

[11] Hujun Bao,et al. Animatable Neural Radiance Fields for Modeling Dynamic Human Bodies , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[12] Stephen Lin,et al. Neural Articulated Radiance Field , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[13] Hao Su,et al. MVSNeRF: Fast Generalizable Radiance Field Reconstruction from Multi-View Stereo , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[14] Pratul P. Srinivasan,et al. IBRNet: Learning Multi-View Image-Based Rendering , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15] Helge Rhodin,et al. A-NeRF: Articulated Neural Radiance Fields for Learning Human Shape, Appearance, and Pose , 2021, NeurIPS.

[16] Ersin Yumer,et al. S3: Neural Shape, Skeleton, and Skinning Fields for 3D Human Modeling , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17] Amit Raj,et al. Pixel-aligned Volumetric Avatars , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18] Hujun Bao,et al. Neural Body: Implicit Neural Representations with Structured Latent Codes for Novel View Synthesis of Dynamic Humans , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19] Carsten Stoll,et al. ANR: Articulated Neural Rendering for Virtual Avatars , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20] M. Zollhöfer,et al. Non-Rigid Neural Radiance Fields: Reconstruction and Novel View Synthesis of a Dynamic Scene From Monocular Video , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[21] Angjoo Kanazawa,et al. pixelNeRF: Neural Radiance Fields from One or Few Images , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22] Francesc Moreno-Noguer,et al. D-NeRF: Neural Radiance Fields for Dynamic Scenes , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23] Zhengqi Li,et al. Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24] Jonathan T. Barron,et al. Nerfies: Deformable Neural Radiance Fields , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[25] Alex Trevithick,et al. GRF: Learning a General Radiance Field for 3D Representation and Rendering , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[26] Jonathan T. Barron,et al. Fourier Features Let Networks Learn High Frequency Functions in Low Dimensional Domains , 2020, NeurIPS.

[27] Qiang Hu,et al. Multi-View Neural Human Rendering , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[28] Hao Li,et al. ARCH: Animatable Reconstruction of Clothed Humans , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29] Hanbyul Joo,et al. PIFuHD: Multi-Level Pixel-Aligned Implicit Function for High-Resolution 3D Human Digitization , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30] Pratul P. Srinivasan,et al. NeRF , 2020, ECCV.

[31] Gordon Wetzstein,et al. Scene Representation Networks: Continuous 3D-Structure-Aware Neural Scene Representations , 2019, NeurIPS.

[32] Hao Li,et al. PIFu: Pixel-Aligned Implicit Function for High-Resolution Clothed Human Digitization , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[33] Richard A. Newcombe,et al. DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[34] Sebastian Nowozin,et al. Occupancy Networks: Learning 3D Reconstruction in Function Space , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[35] Pushmeet Kohli,et al. Fusion4D , 2016, ACM Trans. Graph..

[36] Michael J. Black,et al. SMPL: A Skinned Multi-Person Linear Model , 2023 .

[37] Alvaro Collet,et al. High-quality streamable free-viewpoint video , 2015, ACM Trans. Graph..

[38] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[39] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[40] Cristian Sminchisescu,et al. Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41] Jirí Zára,et al. Skinning with dual quaternions , 2007, SI3D.

[42] John P. Lewis,et al. Pose Space Deformation: A Unified Approach to Shape Interpolation and Skeleton-Driven Deformation , 2000, SIGGRAPH.

[43] Nelson L. Max,et al. Optical Models for Direct Volume Rendering , 1995, IEEE Trans. Vis. Comput. Graph..

[44] James T. Kajiya,et al. Ray tracing volume densities , 1984, SIGGRAPH.