LiveHand: Real-time and Photorealistic Neural Hand Rendering

The human hand is the main medium through which we interact with our surroundings, making its digitization an important problem. Hence, its digitization is of uttermost importance, with direct applications in VR/AR, gaming, and media production amongst other areas. While there are several works modeling the geometry of hands, little attention has been paid to capturing photo-realistic appearance. Moreover, for applications in extended reality and gaming, real-time rendering is critical. We present the first neural-implicit approach to photo-realistically render hands in real-time. This is a challenging problem as hands are textured and undergo strong articulations with pose-dependent effects. However, we show that this aim is achievable through our carefully designed method. This includes training on a low-resolution rendering of a neural radiance field, together with a 3D-consistent super-resolution module and mesh-guided sampling and space canonicalization. We demonstrate a novel application of perceptual loss on the image space, which is critical for learning details accurately. We also show a live demo where we photo-realistically render the human hand in real-time for the first time, while also modeling pose- and view-dependent appearance effects. We ablate all our design choices and show that they optimize for rendering speed and quality. Our code will be released to encourage further research in this area. The supplementary video can be found at: tinyurl.com/46uvujzn

[1]  D. Stricker,et al.  THOR-Net: End-to-end Graformer-based Realistic Two Hands and Object Reconstruction with Self-supervision , 2022, 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV).

[2]  Hongwen Zhang,et al.  FloRen: Real-time High-quality Human Performance Rendering via Appearance Flow Using Sparse RGB Cameras , 2022, SIGGRAPH Asia.

[3]  C. Theobalt,et al.  HDHumans: A Hybrid Approach for High-fidelity Digital Humans , 2022, ArXiv.

[4]  Cewu Lu,et al.  DART: Articulated Hand Model with Diverse Accessories and Rich Textures , 2022, NeurIPS.

[5]  Juyong Zhang,et al.  Reconstructing Personalized Semantic Facial NeRF Models from Monocular Video , 2022, ACM Trans. Graph..

[6]  Jason M. Saragih,et al.  Drivable Volumetric Avatars using Texel-Aligned Features , 2022, SIGGRAPH.

[7]  Jason M. Saragih,et al.  Authentic volumetric avatars from a phone scan , 2022, ACM Trans. Graph..

[8]  Aayush Bansal,et al.  COAP: Compositional Articulated Occupancy of People , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Richard A. Newcombe,et al.  LISA: Learning Implicit Shape and Appearance of Hands , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Lan Xu,et al.  Artemis: Articulated Neural Pets with Appearance and Motion synthesis , 2022, ACM Trans. Graph..

[11]  Lan Xu,et al.  Mutual Adaptive Reasoning for Monocular 3D Multi-Person Pose Estimation , 2022, ACM Trans. Graph..

[12]  H. Bao,et al.  SelfRecon: Self Reconstruction Your Digital Avatar from Monocular Video , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Pratul P. Srinivasan,et al.  HumanNeRF: Free-viewpoint Rendering of Moving People from Monocular Video , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  A. Vedaldi,et al.  BANMo: Building Animatable 3D Neural Models from Many Casual Videos , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Michael J. Black,et al.  ICON: Implicit Clothed humans Obtained from Normals , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Shalini De Mello,et al.  Efficient Geometry-aware 3D Generative Adversarial Networks , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Michael J. Black,et al.  I M Avatar: Implicit Morphable Head Avatars from Videos , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  C. Rother,et al.  Neural Head Avatars from Monocular RGB Videos , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Dimitrios Tzionas,et al.  Embodied Hands: Modeling and Capturing Hands and Bodies Together , 2022, ArXiv.

[20]  Sanja Fidler,et al.  Hierarchical Neural Implicit Pose Network for Animation and Motion Retargeting , 2021, ArXiv.

[21]  Adrian Spurr,et al.  A Skeleton-Driven Neural Occupancy Representation for Articulated Hands , 2021, 2021 International Conference on 3D Vision (3DV).

[22]  Hongyi Xu,et al.  imGHUM: Implicit Generative Models of 3D Human Shape and Articulated Pose , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[23]  Tony Tung,et al.  Neural-GIF: Neural Generalized Implicit Functions for Animating People in Clothing , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[24]  Christian Theobalt,et al.  Real-time deep dynamic characters , 2021, ACM Transactions on Graphics.

[25]  Jingyi Yu,et al.  PIANO: A Parametric Hand Bone Model from Magnetic Resonance Imaging , 2021, IJCAI.

[26]  Christian Theobalt,et al.  Neural actor , 2021, ACM Trans. Graph..

[27]  Jason M. Saragih,et al.  Driving-signal aware full-body avatars , 2021, ACM Trans. Graph..

[28]  Hujun Bao,et al.  Animatable Neural Radiance Fields for Modeling Dynamic Human Bodies , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[29]  Jason M. Saragih,et al.  Pixel Codec Avatars , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Stephen Lin,et al.  Neural Articulated Radiance Field , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[31]  Michael J. Black,et al.  SCANimate: Weakly Supervised Learning of Skinned Clothed Avatar Networks , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Angela Dai,et al.  NPMs: Neural Parametric Models for 3D Deformable Shapes , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[33]  Lan Xu,et al.  NeuralHumanFVV: Real-Time Neural Volumetric Human Performance Rendering using RGB Cameras , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Yaser Sheikh,et al.  Mixture of volumetric primitives for efficient neural rendering , 2021, ACM Transactions on Graphics.

[35]  Helge Rhodin,et al.  A-NeRF: Articulated Neural Radiance Fields for Learning Human Shape, Appearance, and Pose , 2021, NeurIPS.

[36]  Hujun Bao,et al.  Neural Body: Implicit Neural Representations with Structured Latent Codes for Novel View Synthesis of Dynamic Humans , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Carsten Stoll,et al.  ANR: Articulated Neural Rendering for Virtual Avatars , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Michael J. Black,et al.  SMPLpix: Neural Avatars from 3D Human Models , 2020, 2021 IEEE Winter Conference on Applications of Computer Vision (WACV).

[39]  Takaaki Shiratori,et al.  InterHand2.6M: A Dataset and Baseline for 3D Interacting Hand Pose Estimation from a Single RGB Image , 2020, ECCV.

[40]  Michael J. Black,et al.  STAR: Sparse Trained Articulated Human Body Regressor , 2020, ECCV.

[41]  Takaaki Shiratori,et al.  DeepHandMesh: A Weakly-supervised Deep Encoder-Decoder Framework for High-fidelity Hand Mesh Modeling , 2020, ECCV.

[42]  Yan Zhang,et al.  Grasping Field: Learning Implicit Representations for Human Grasps , 2020, 2020 International Conference on 3D Vision (3DV).

[43]  Bharat Lal Bhatnagar,et al.  Combining Implicit Function Learning and Parametric Models for 3D Human Reconstruction , 2020, ECCV.

[44]  C. Theobalt,et al.  Monocular Real-Time Hand Shape and Motion Capture Using Multi-Modal Data , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Pratul P. Srinivasan,et al.  NeRF , 2020, ECCV.

[46]  Christian Theobalt,et al.  HTML: A Parametric Hand Texture Model for 3D Hand Reconstruction and Personalization , 2020, ECCV.

[47]  Dimitrios Tzionas,et al.  Expressive Body Capture: 3D Hands, Face, and Body From a Single Image , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  D. Scaramuzza,et al.  International Conference on 3D Vision (3DV 2018) , 2018 .

[49]  Alexei A. Efros,et al.  The Unreasonable Effectiveness of Deep Features as a Perceptual Metric , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[50]  Yaser Sheikh,et al.  Total Capture: A 3D Deformation Model for Tracking Faces, Hands, and Bodies , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[51]  Michael J. Black,et al.  Learning a model of facial shape and expression from 4D scans , 2017, ACM Trans. Graph..

[52]  Michael J. Black,et al.  3D Menagerie: Modeling the 3D Shape and Pose of Animals , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[53]  Michael J. Black,et al.  SMPL: A Skinned Multi-Person Linear Model , 2023 .

[54]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[55]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[56]  Hans-Peter Seidel,et al.  Video-based characters: creating new human performances from a multi-view video database , 2011, ACM Trans. Graph..