Dynamic Multiview Refinement of 3D Hand Datasets using Differentiable Ray Tracing

With the increase of AI applications in the field of 3D estimation of hand state, the quality of the datasets used for training the relevant models is of utmost importance. Especially in the case of datasets consisting of real-world images, the quality of annotations, i.e., how accurately the provided ground truth reflects the true state of the scene, can greatly affect the performance of downstream applications. In this work, we propose a methodology with significant impact on improving ubiquitous 3D hand geometry datasets that contain real images with imperfect annotations. Our approach leverages multi-view imagery, temporal consistency, and a disentangled representation of hand shape, texture, and environment lighting. This allows to re-fine the hand geometry of existing datasets and also paves the way for texture extraction. Extensive experiments on synthetic and real-world data show that our method outperforms the current state of the art, resulting in more accurate and visually pleasing reconstructions of hand gestures.

[1]  Siyu Tang,et al.  HARP: Personalized Hand Reconstruction from a Monocular RGB Video , 2022, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  H. Shum,et al.  Hand Avatar: Free-Pose Hand Animation and Rendering from Monocular Video , 2022, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Wanli Ouyang,et al.  3D Interacting Hand Pose Estimation by Hand De-occlusion and Removal , 2022, ECCV.

[4]  Liuhao Ge,et al.  End-to-End 3D Hand Pose Estimation from Stereo Cameras , 2022, BMVC.

[5]  J. Liao,et al.  Mining Multi-View Information: A Strong Self-Supervised Framework for Depth-based 3D Hand Pose and Mesh Estimation , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  A. Gupta,et al.  What's in your hands? 3D Reconstruction of Generic Objects in Hands , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Richard A. Newcombe,et al.  LISA: Learning Implicit Shape and Appearance of Hands , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Kyoung Mu Lee,et al.  HandOccNet: Occlusion-Robust 3D Hand Mesh Estimation Network , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Yu-Jhe Li,et al.  Domain Adaptive Hand Keypoint and Pixel Localization in the Wild , 2022, ECCV.

[10]  Yuan Zhang,et al.  MobRecon: Mobile-Friendly Hand Mesh Reconstruction from Monocular Image , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Xucong Zhang,et al.  Self-Supervised 3D Hand Pose Estimation from monocular RGB via Contrastive Learning , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[12]  Antonis A. Argyros,et al.  Multi-view Image-based Hand Geometry Refinement using Differentiable Monte Carlo Ray Tracing , 2021, BMVC.

[13]  Jingyi Yu,et al.  PIANO: A Parametric Hand Bone Model from Magnetic Resonance Imaging , 2021, IJCAI.

[14]  Pavlo Molchanov,et al.  Adversarial Motion Modelling helps Semi-supervised Hand Pose Estimation , 2021, ArXiv.

[15]  Thomas Brox,et al.  Contrastive Representation Learning for Hand Shape Estimation , 2021, GCPR.

[16]  Junsong Yuan,et al.  Model-based 3D Hand Reconstruction via Self-Supervised Learning , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Cewu Lu,et al.  HandTailor: Towards High-Precision Monocular 3D Hand Recovery , 2021, BMVC.

[18]  Yaser Sheikh,et al.  Constraining dense hand surface tracking with elasticity , 2020, ACM Trans. Graph..

[19]  Kyoung Mu Lee,et al.  NeuralAnnot: Neural Annotator for 3D Human Mesh Training Sets , 2020, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[20]  Aphrodite Galata,et al.  Hand tracking from monocular RGB with dense semantic labels , 2020, 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020).

[21]  Takaaki Shiratori,et al.  InterHand2.6M: A Dataset and Baseline for 3D Interacting Hand Pose Estimation from a Single RGB Image , 2020, ECCV.

[22]  Takaaki Shiratori,et al.  DeepHandMesh: A Weakly-supervised Deep Encoder-Decoder Framework for High-fidelity Hand Mesh Modeling , 2020, ECCV.

[23]  Hyung Jin Chang,et al.  SeqHAND: RGB-Sequence-Based 3D Hand Pose and Shape Estimation , 2020, ECCV.

[24]  Chengde Wan,et al.  MEgATrack , 2020, ACM Trans. Graph..

[25]  Junsong Yuan,et al.  3D Hand Pose Estimation Using Synthetic Data and Weakly Labeled RGB Images , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Iasonas Kokkinos,et al.  Weakly-Supervised Mesh-Convolutional Hand Reconstruction in the Wild , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Federico Tombari,et al.  Variational Object-Aware 3-D Hand Pose From a Single RGB Image , 2019, IEEE Robotics and Automation Letters.

[28]  Thomas Brox,et al.  FreiHAND: A Dataset for Markerless Capture of Hand Pose and Shape From Single RGB Images , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[29]  Stefanos Zafeiriou,et al.  Single Image 3D Hand Reconstruction with Mesh Convolutions , 2019, BMVC.

[30]  Michael J. Black,et al.  Learning Joint Reconstruction of Hands and Manipulated Objects , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Tae-Kyun Kim,et al.  Pushing the Envelope for RGB-Based Dense 3D Hand Pose Estimation via Neural Rendering , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Junsong Yuan,et al.  3D Hand Shape and Pose Estimation From a Single RGB Image , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Qiang Li,et al.  End-to-End Hand Mesh Recovery From a Monocular RGB Image , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[34]  Philip H. S. Torr,et al.  3D Hand Shape and Pose From Images in the Wild , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Jaakko Lehtinen,et al.  Differentiable Monte Carlo ray tracing through edge sampling , 2018, ACM Trans. Graph..

[36]  Jianfei Cai,et al.  Weakly-Supervised 3D Hand Pose Estimation from Monocular RGB Images , 2018, ECCV.

[37]  Kenrick Kin,et al.  Online optical marker-based hand tracking with deep labels , 2018, ACM Trans. Graph..

[38]  Dongheui Lee,et al.  Model-based Hand Pose Estimation for Generalized Hand Shape with Appearance Normalization , 2018, ArXiv.

[39]  Daniel Thalmann,et al.  Robust 3D Hand Pose Estimation From Single Depth Images Using Multi-View CNNs , 2018, IEEE Transactions on Image Processing.

[40]  Pavlo Molchanov,et al.  Hand Pose Estimation via Latent 2.5D Heatmap Regression , 2018, ECCV.

[41]  Antonis A. Argyros,et al.  Using a Single RGB Frame for Real Time 3D Hand Pose Estimation in the Wild , 2017, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[42]  Christian Theobalt,et al.  GANerated Hands for Real-Time 3D Hand Tracking from Monocular RGB , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[43]  Kyoung Mu Lee,et al.  V2V-PoseNet: Voxel-to-Voxel Prediction Network for Accurate 3D Hand and Human Pose Estimation from a Single Depth Map , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[44]  Tatsuya Harada,et al.  Neural 3D Mesh Renderer , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[45]  Eduardo Alonso,et al.  Hand Pose Estimation Using Deep Stereovision and Markov-Chain Monte Carlo , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[46]  Antonis A. Argyros,et al.  Back to RGB: 3D Tracking of Hands and Hand-Object Interactions Based on Short-Baseline Stereo , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[47]  Thomas Brox,et al.  Learning to Estimate 3D Hand Pose from Single RGB Images , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[48]  C. Theobalt,et al.  Real-Time Hand Tracking under Occlusion from an Egocentric RGB-D Sensor , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[49]  Vincent Lepetit,et al.  Training a Feedback Loop for Hand Pose Estimation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[50]  A. Tagliasacchi,et al.  Robust Articulated‐ICP for Real‐Time Hand Tracking , 2015, SGP '15.

[51]  Marc Pollefeys,et al.  Capturing Hands in Action Using Discriminative Salient Points and Physics Simulation , 2015, International Journal of Computer Vision.

[52]  Tae-Kyun Kim,et al.  Latent Regression Forest: Structured Estimation of 3D Articulated Hand Posture , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[53]  Chen Qian,et al.  Realtime and Robust Hand Tracking from Depth , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[54]  Christian Szegedy,et al.  DeepPose: Human Pose Estimation via Deep Neural Networks , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[55]  Antti Oulasvirta,et al.  Interactive Markerless Articulated Hand Motion Tracking Using RGB and Depth Data , 2013, 2013 IEEE International Conference on Computer Vision.

[56]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[57]  Antonis A. Argyros,et al.  Full DOF tracking of a hand interacting with an object by modeling occlusions and physical constraints , 2011, 2011 International Conference on Computer Vision.

[58]  Lale Akarun,et al.  Real time hand pose estimation using depth sensors , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[59]  David J. Fleet,et al.  Model-Based 3D Hand Pose Estimation from Monocular Video , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[60]  Antonis A. Argyros,et al.  Markerless and Efficient 26-DOF Hand Pose Recovery , 2010, ACCV.

[61]  Danica Kragic,et al.  Monocular real-time 3D articulated hand pose estimation , 2009, 2009 9th IEEE-RAS International Conference on Humanoid Robots.

[62]  David J. Fleet,et al.  Model-based hand tracking with texture, shading and self-occlusions , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[63]  Roberto Manduchi,et al.  Bilateral filtering for gray and color images , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[64]  Takeo Kanade,et al.  DigitEyes: vision-based hand tracking for human-computer interaction , 1994, Proceedings of 1994 IEEE Workshop on Motion of Non-rigid and Articulated Objects.

[65]  Michael J. Black,et al.  Articulated Objects in Free-form Hand Interaction , 2022, ArXiv.

[66]  Vincent Lepetit,et al.  HandsFormer: Keypoint Transformer for Monocular 3D Pose Estimation ofHands and Object in Interaction , 2021, ArXiv.

[67]  Christian Theobalt,et al.  HTML: A Parametric Hand Texture Model for 3D Hand Reconstruction and Personalization , 2020, ECCV.

[68]  Michael J. Black,et al.  Embodied Hands : Modeling and Capturing Hands and Bodies Together * * Supplementary Material * * , 2017 .

[69]  Antonis A. Argyros,et al.  Model-based 3 D Hand Tracking with on-line Hand Shape Adaptation , 2015 .

[70]  Antonis A. Argyros,et al.  Efficient model-based 3D tracking of hand articulations using Kinect , 2011, BMVC.