AniPixel: Towards Animatable Pixel-Aligned Human Avatar

Neural radiance field using pixel-aligned features can render photo-realistic novel views. However, when pixel-aligned features are directly introduced to human avatar reconstruction, the rendering can only be conducted for still humans, rather than animatable avatars. In this paper, we propose AniPixel, a novel animatable and generalizable human avatar reconstruction method that leverages pixel-aligned features for body geometry prediction and RGB color blending. Technically, to align the canonical space with the target space and the observation space, we propose a bidirectional neural skinning field based on skeleton-driven deformation to establish the target-to-canonical and canonical-to-observation correspondences. Then, we disentangle the canonical body geometry into a normalized neutral-sized body and a subject-specific residual for better generalizability. As the geometry and appearance are closely related, we introduce pixel-aligned features to facilitate the body geometry prediction and detailed surface normals to reinforce the RGB color blending. Moreover, we devise a pose-dependent and view direction-related shading module to represent the local illumination variance. Experiments show that our AniPixel renders comparable novel views while delivering better novel pose animation results than state-of-the-art methods. The code will be released.

[1]  Aayush Bansal,et al.  KeypointNeRF: Generalizing Image-based Volumetric Avatars using Relative Spatial Encoding of Keypoints , 2022, ECCV.

[2]  C. Sminchisescu,et al.  Photorealistic Monocular 3D Reconstruction of Humans Wearing Clothing , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Jiaolong Yang,et al.  MPS-NeRF: Generalizable 3D Human Rendering from Multiview Images , 2022, IEEE transactions on pattern analysis and machine intelligence.

[4]  T. Müller,et al.  Instant neural graphics primitives with a multiresolution hash encoding , 2022, ACM Trans. Graph..

[5]  Pratul P. Srinivasan,et al.  HumanNeRF: Free-viewpoint Rendering of Moving People from Monocular Video , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Jiakai Zhang,et al.  HumanNeRF: Efficiently Generated Human Radiance Field from Sparse Inputs , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Henry Fuchs,et al.  Neural Human Performer: Learning Generalizable Radiance Fields for Human Performance Rendering , 2021, NeurIPS.

[8]  Stefano Soatto,et al.  ARCH++: Animation-Ready Clothed Human Reconstruction Revisited , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[9]  Yaron Lipman,et al.  Volume Rendering of Neural Implicit Surfaces , 2021, NeurIPS.

[10]  Christian Theobalt,et al.  Neural actor , 2021, ACM Trans. Graph..

[11]  Hujun Bao,et al.  Animatable Neural Radiance Fields for Modeling Dynamic Human Bodies , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[12]  Stephen Lin,et al.  Neural Articulated Radiance Field , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[13]  Hao Su,et al.  MVSNeRF: Fast Generalizable Radiance Field Reconstruction from Multi-View Stereo , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[14]  Pratul P. Srinivasan,et al.  IBRNet: Learning Multi-View Image-Based Rendering , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Helge Rhodin,et al.  A-NeRF: Articulated Neural Radiance Fields for Learning Human Shape, Appearance, and Pose , 2021, NeurIPS.

[16]  Ersin Yumer,et al.  S3: Neural Shape, Skeleton, and Skinning Fields for 3D Human Modeling , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Amit Raj,et al.  Pixel-aligned Volumetric Avatars , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Hujun Bao,et al.  Neural Body: Implicit Neural Representations with Structured Latent Codes for Novel View Synthesis of Dynamic Humans , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Carsten Stoll,et al.  ANR: Articulated Neural Rendering for Virtual Avatars , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  M. Zollhöfer,et al.  Non-Rigid Neural Radiance Fields: Reconstruction and Novel View Synthesis of a Dynamic Scene From Monocular Video , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[21]  Angjoo Kanazawa,et al.  pixelNeRF: Neural Radiance Fields from One or Few Images , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Francesc Moreno-Noguer,et al.  D-NeRF: Neural Radiance Fields for Dynamic Scenes , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Zhengqi Li,et al.  Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Jonathan T. Barron,et al.  Nerfies: Deformable Neural Radiance Fields , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[25]  Alex Trevithick,et al.  GRF: Learning a General Radiance Field for 3D Representation and Rendering , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[26]  Jonathan T. Barron,et al.  Fourier Features Let Networks Learn High Frequency Functions in Low Dimensional Domains , 2020, NeurIPS.

[27]  Qiang Hu,et al.  Multi-View Neural Human Rendering , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Hao Li,et al.  ARCH: Animatable Reconstruction of Clothed Humans , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Hanbyul Joo,et al.  PIFuHD: Multi-Level Pixel-Aligned Implicit Function for High-Resolution 3D Human Digitization , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Pratul P. Srinivasan,et al.  NeRF , 2020, ECCV.

[31]  Gordon Wetzstein,et al.  Scene Representation Networks: Continuous 3D-Structure-Aware Neural Scene Representations , 2019, NeurIPS.

[32]  Hao Li,et al.  PIFu: Pixel-Aligned Implicit Function for High-Resolution Clothed Human Digitization , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[33]  Richard A. Newcombe,et al.  DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Sebastian Nowozin,et al.  Occupancy Networks: Learning 3D Reconstruction in Function Space , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Pushmeet Kohli,et al.  Fusion4D , 2016, ACM Trans. Graph..

[36]  Michael J. Black,et al.  SMPL: A Skinned Multi-Person Linear Model , 2023 .

[37]  Alvaro Collet,et al.  High-quality streamable free-viewpoint video , 2015, ACM Trans. Graph..

[38]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[39]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[40]  Cristian Sminchisescu,et al.  Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41]  Jirí Zára,et al.  Skinning with dual quaternions , 2007, SI3D.

[42]  John P. Lewis,et al.  Pose Space Deformation: A Unified Approach to Shape Interpolation and Skeleton-Driven Deformation , 2000, SIGGRAPH.

[43]  Nelson L. Max,et al.  Optical Models for Direct Volume Rendering , 1995, IEEE Trans. Vis. Comput. Graph..

[44]  James T. Kajiya,et al.  Ray tracing volume densities , 1984, SIGGRAPH.