A-NeRF: Surface-free Human 3D Pose Refinement via Neural Rendering

Fig. 1. Our A-NeRF test-time optimization for monocular 3D human pose estimation jointly learns a volumetric body model of the user that can be animated and works with diverse body shapes (left), while also refining the initial 3D skeleton pose estimate from a single or, if available, multiple views without tedious camera calibration (right). Underlying is a surface-free neural representation and skeleton-based embedding coupled with volumetric rendering. Faces in H3.6M images blurred for anonymity.

[1]  Andreas Aristidou,et al.  MotioNet , 2020, ACM Trans. Graph..

[2]  Yaser Sheikh,et al.  Deep appearance models for face rendering , 2018, ACM Trans. Graph..

[3]  Yi Zhou,et al.  On the Continuity of Rotation Representations in Neural Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Hans-Peter Seidel,et al.  Shading-based dynamic shape refinement from multi-view video under general illumination , 2011, 2011 International Conference on Computer Vision.

[5]  Justus Thies,et al.  Deferred Neural Rendering: Image Synthesis using Neural Textures , 2019 .

[6]  Dimitrios Tzionas,et al.  Monocular Expressive Body Regression through Body-Driven Attention , 2020, ECCV.

[7]  Gordon Wetzstein,et al.  Implicit Neural Representations with Periodic Activation Functions , 2020, NeurIPS.

[8]  Jitendra Malik,et al.  End-to-End Recovery of Human Shape and Pose , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[9]  Hans-Peter Seidel,et al.  General Automatic Human Shape and Motion Capture Using Volumetric Contour Cues , 2016, ECCV.

[10]  Michael J. Black,et al.  VIBE: Video Inference for Human Body Pose and Shape Estimation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Andreas Geiger,et al.  GRAF: Generative Radiance Fields for 3D-Aware Image Synthesis , 2020, NeurIPS.

[12]  Yaser Sheikh,et al.  Monocular Total Capture: Posing Face, Body, and Hands in the Wild , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Gordon Wetzstein,et al.  State of the Art on Neural Rendering , 2020, Comput. Graph. Forum.

[14]  Kai Zhang,et al.  NeRF++: Analyzing and Improving Neural Radiance Fields , 2020, ArXiv.

[15]  Jiajun Wu,et al.  Neural Radiance Flow for 4D View Synthesis and Video Processing , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[16]  Marcus A. Magnor,et al.  Tex2Shape: Detailed Full Human Body Geometry From a Single Image , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[17]  Hans-Peter Seidel,et al.  EgoCap , 2016, ACM Trans. Graph..

[18]  Hanbyul Joo,et al.  PIFuHD: Multi-Level Pixel-Aligned Implicit Function for High-Resolution 3D Human Digitization , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Hao Li,et al.  Soft Rasterizer: A Differentiable Renderer for Image-Based 3D Reasoning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[20]  Christian Theobalt,et al.  Multi-Garment Net: Learning to Dress 3D People From Images , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[21]  Richard Szeliski,et al.  SynSin: End-to-End View Synthesis From a Single Image , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Hujun Bao,et al.  Neural Body: Implicit Neural Representations with Structured Latent Codes for Novel View Synthesis of Dynamic Humans , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Christian Theobalt,et al.  MonoPerfCap , 2017, ACM Trans. Graph..

[24]  James J. Little,et al.  A Simple Yet Effective Baseline for 3d Human Pose Estimation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[25]  Christian Theobalt,et al.  LiveCap , 2018, ACM Trans. Graph..

[26]  Hao Li,et al.  ARCH: Animatable Reconstruction of Clothed Humans , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Victor Lempitsky,et al.  Neural Point-Based Graphics , 2019, ECCV.

[28]  Cristian Sminchisescu,et al.  Latent structured models for human pose estimation , 2011, 2011 International Conference on Computer Vision.

[29]  Charles T. Loop,et al.  Holoportation: Virtual 3D Teleportation in Real-time , 2016, UIST.

[30]  Hans-Peter Seidel,et al.  A Versatile Scene Model with Differentiable Visibility Applied to Generative Pose Estimation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[31]  Lourdes Agapito,et al.  Rethinking Pose in 3D: Multi-stage Refinement and Recovery for Markerless Motion Capture , 2018, 2018 International Conference on 3D Vision (3DV).

[32]  Zhen He,et al.  3D Human Pose Estimation With 2D Marginal Heatmaps , 2018, 2019 IEEE Winter Conference on Applications of Computer Vision (WACV).

[33]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[34]  Michael J. Black,et al.  Learning to Reconstruct 3D Human Pose and Shape via Model-Fitting in the Loop , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[35]  Hans-Peter Seidel,et al.  VNect , 2017, ACM Trans. Graph..

[36]  Iasonas Kokkinos,et al.  HoloPose: Holistic 3D Human Reconstruction In-The-Wild , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Ersin Yumer,et al.  Self-supervised Learning of Motion Capture , 2017, NIPS.

[38]  Hans-Peter Seidel,et al.  MovieReshape: tracking and reshaping of humans in videos , 2010, ACM Trans. Graph..

[39]  Michael J. Black,et al.  Detailed Human Shape and Pose from Images , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[40]  Jonathan T. Barron,et al.  Fourier Features Let Networks Learn High Frequency Functions in Low Dimensional Domains , 2020, NeurIPS.

[41]  Jonathan T. Barron,et al.  NeRF in the Wild: Neural Radiance Fields for Unconstrained Photo Collections , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Pratul P. Srinivasan,et al.  NeRF , 2020, ECCV.

[43]  Francesc Moreno-Noguer,et al.  3D Human Pose Estimation from a Single Image via Distance Matrix Regression , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Francesc Moreno-Noguer,et al.  D-NeRF: Neural Radiance Fields for Dynamic Scenes , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Andrea Tagliasacchi,et al.  NASA: Neural Articulated Shape Approximation , 2020, ECCV.

[46]  Chia-Kai Liang,et al.  Portrait Neural Radiance Fields from a Single Image , 2020, ArXiv.

[47]  Xiaowei Zhou,et al.  Coarse-to-Fine Volumetric Prediction for Single-Image 3D Human Pose , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Marcus A. Magnor,et al.  Video Based Reconstruction of 3D People Models , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[49]  Cristian Sminchisescu,et al.  Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[50]  Jonathan T. Barron,et al.  Deformable Neural Radiance Fields , 2020, ArXiv.

[51]  Michael J. Black,et al.  Three-D Safari: Learning to Estimate Zebra Pose, Shape, and Texture From Images “In the Wild” , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[52]  Alex Trevithick,et al.  GRF: Learning a General Radiance Field for 3D Scene Representation and Rendering , 2020, ArXiv.

[53]  Xiaowei Zhou,et al.  Learning to Estimate 3D Human Pose and Shape from a Single Color Image , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[54]  Justus Thies,et al.  Dynamic Neural Radiance Fields for Monocular 4D Facial Avatar Reconstruction , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[55]  Michael J. Black,et al.  SMPL: A Skinned Multi-Person Linear Model , 2023 .

[56]  Cordelia Schmid,et al.  Learning from Synthetic Humans , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[57]  Christian Theobalt,et al.  Non-Rigid Neural Radiance Fields: Reconstruction and Novel View Synthesis of a Dynamic Scene From Monocular Video , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[58]  Pascal Fua,et al.  XNect: Real-time Multi-Person 3D Motion Capture with a Single RGB Camera , 2020, SIGGRAPH 2020.

[59]  Marcus A. Magnor,et al.  Detailed Human Avatars from Monocular Video , 2018, 2018 International Conference on 3D Vision (3DV).

[60]  Victor Lempitsky,et al.  Learnable Triangulation of Human Pose , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[61]  Kwang-Ting Cheng,et al.  Cascaded Deep Monocular 3D Human Pose Estimation With Evolutionary Training Data , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[62]  Bingbing Ni,et al.  Deep Kinematics Analysis for Monocular 3D Human Pose Estimation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[63]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[64]  Michael J. Black,et al.  Estimating human shape and pose from a single image , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[65]  Peter V. Gehler,et al.  Keep It SMPL: Automatic Estimation of 3D Human Pose and Shape from a Single Image , 2016, ECCV.

[66]  Gordon Wetzstein,et al.  DeepVoxels: Learning Persistent 3D Feature Embeddings , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[67]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[68]  Marcus A. Magnor,et al.  Learning to Reconstruct People in Clothing From a Single RGB Camera , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[69]  Noah Snavely,et al.  Neural Rerendering in the Wild , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[70]  Hao Li,et al.  PIFu: Pixel-Aligned Implicit Function for High-Resolution Clothed Human Digitization , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[71]  Hans-Peter Seidel,et al.  Model-Based Outdoor Performance Capture , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[72]  Wei Zhang,et al.  Deep Kinematic Pose Regression , 2016, ECCV Workshops.

[73]  Matthew Tancik,et al.  pixelNeRF: Neural Radiance Fields from One or Few Images , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[74]  Hujun Bao,et al.  Motion Capture from Internet Videos , 2020, ECCV.

[75]  Peter V. Gehler,et al.  Unite the People: Closing the Loop Between 3D and 2D Human Representations , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[76]  Andrew W. Fitzgibbon,et al.  The Vitruvian manifold: Inferring dense correspondences for one-shot human pose estimation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[77]  Gordon Wetzstein,et al.  Scene Representation Networks: Continuous 3D-Structure-Aware Neural Scene Representations , 2019, NeurIPS.

[78]  Changil Kim,et al.  Space-time Neural Irradiance Fields for Free-Viewpoint Video , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[79]  Hans-Peter Seidel,et al.  Fast articulated motion tracking using a sums of Gaussians body model , 2011, 2011 International Conference on Computer Vision.

[80]  Peter V. Gehler,et al.  Neural Body Fitting: Unifying Deep Learning and Model Based Human Pose and Shape Estimation , 2018, 2018 International Conference on 3D Vision (3DV).