论文信息 - Im2Hands: Learning Attentive Implicit Representation of Interacting Two-Hand Shapes

Im2Hands: Learning Attentive Implicit Representation of Interacting Two-Hand Shapes

We present Implicit Two Hands (Im2Hands), the first neural implicit representation of two interacting hands. Unlike existing methods on two-hand reconstruction that rely on a parametric hand model and/or low-resolution meshes, Im2Hands can produce fine-grained geometry of two hands with high hand-to-hand and hand-to-image coherency. To handle the shape complexity and interaction context between two hands, Im2Hands models the occupancy volume of two hands - conditioned on an RGB image and coarse 3D keypoints - by two novel attention-based modules responsible for (1) initial occupancy estimation and (2) context-aware occupancy refinement, respectively. Im2Hands first learns per-hand neural articulated occupancy in the canonical space designed for each hand using query-image attention. It then refines the initial two-hand occupancy in the posed space to enhance the coherency between the two hand shapes using query-anchor attention. In addition, we introduce an optional keypoint refinement module to enable robust two-hand shape estimation from predicted hand keypoints in a single-image reconstruction scenario. We experimentally demonstrate the effectiveness of Im2Hands on two-hand reconstruction in comparison to related methods, where ours achieves state-of-the-art results. Our code is publicly available at https://github.com/jyunlee/Im2Hands.

Tae-Kyun Kim | Minhyuk Sung | H. Choi | Jihyun Lee

[1] Aayush Bansal,et al. COAP: Compositional Articulated Occupancy of People , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2] Richard A. Newcombe,et al. LISA: Learning Implicit Shape and Appearance of Hands , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3] Tao Yu,et al. Interacting Attention Graph for Single Image Two-Hand Reconstruction , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4] Chen Change Loy,et al. Monocular 3D Reconstruction of Interacting Hands via Collision-Aware Factorized Refinements , 2021, 2021 International Conference on 3D Vision (3DV).

[5] Bastian Goldlücke,et al. AIR-Nets: An Attention-Based Framework for Locally Conditioned Implicit Representations , 2021, 2021 International Conference on 3D Vision (3DV).

[6] P. Tan,et al. Interacting Two-Hand 3D Pose and Shape Reconstruction from Single Color Image , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[7] K. Kim,et al. End-to-End Detection and Pose Estimation of Two Interacting Hands , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[8] Adrian Spurr,et al. A Skeleton-Driven Neural Occupancy Representation for Articulated Hands , 2021, 2021 International Conference on 3D Vision (3DV).

[9] Michael J. Black,et al. Learning To Disambiguate Strongly Interacting Hands via Probabilistic Per-Pixel Part Segmentation , 2021, 2021 International Conference on 3D Vision (3DV).

[10] Yaron Lipman,et al. Volume Rendering of Neural Implicit Surfaces , 2021, NeurIPS.

[11] Stephen Lin,et al. Neural Articulated Radiance Field , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[12] Michael J. Black,et al. SCANimate: Weakly Supervised Learning of Skinned Clothed Avatar Networks , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13] Kevin Lin,et al. End-to-End Human Pose and Mesh Reconstruction with Transformers , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14] Ilija Radosavovic,et al. Reconstructing Hand-Object Interactions in the Wild , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[15] Klaus Dietmayer,et al. Point Transformer , 2020, IEEE Access.

[16] S. Gelly,et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.

[17] Takaaki Shiratori,et al. InterHand2.6M: A Dataset and Baseline for 3D Interacting Hand Pose Estimation from a Single RGB Image , 2020, ECCV.

[18] Yan Zhang,et al. Grasping Field: Learning Implicit Representations for Human Grasps , 2020, 2020 International Conference on 3D Vision (3DV).

[19] Tae-Kyun Kim,et al. Weakly-Supervised Domain Adaptation via GAN and Mesh Model for Estimating 3D Hand Poses Interacting Objects , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20] Yana Hasson,et al. Leveraging Photometric Consistency Over Time for Sparsely Supervised Hand-Object Reconstruction , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21] Iasonas Kokkinos,et al. Weakly-Supervised Mesh-Convolutional Hand Reconstruction in the Wild , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22] David J. Crandall,et al. HOPE-Net: A Graph-Based Model for Hand-Object Pose Estimation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23] C. Theobalt,et al. Monocular Real-Time Hand Shape and Motion Capture Using Multi-Modal Data , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24] J. Kautz,et al. Weakly Supervised 3D Hand Pose Estimation via Biomechanical Constraints , 2020, ECCV.

[25] Geoffrey E. Hinton,et al. NASA: Neural Articulated Shape Approximation , 2019, ECCV.

[26] Luc Van Gool,et al. Dual Grid Net: hand mesh vertex regression from single depth maps , 2019, ECCV.

[27] Miguel A. Otaduy,et al. Real-time pose and shape reconstruction of two interacting hands with a single depth camera , 2019, ACM Trans. Graph..

[28] Hao Li,et al. PIFu: Pixel-Aligned Implicit Function for High-Resolution Clothed Human Digitization , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[29] Cordelia Schmid,et al. Learning Joint Reconstruction of Hands and Manipulated Objects , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30] Tae-Kyun Kim,et al. Pushing the Envelope for RGB-Based Dense 3D Hand Pose Estimation via Neural Rendering , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31] Junsong Yuan,et al. 3D Hand Shape and Pose Estimation From a Single RGB Image , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32] Philip H. S. Torr,et al. 3D Hand Shape and Pose From Images in the Wild , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[33] Jianfei Cai,et al. Weakly-Supervised 3D Hand Pose Estimation from Monocular RGB Images , 2018, ECCV.

[34] Pavlo Molchanov,et al. Hand Pose Estimation via Latent 2.5D Heatmap Regression , 2018, ECCV.

[35] Kyoung Mu Lee,et al. V2V-PoseNet: Voxel-to-Voxel Prediction Network for Accurate 3D Hand and Human Pose Estimation from a Single Depth Map , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[36] David Kim,et al. Articulated distance fields for ultra-fast tracking of hands interacting , 2017, ACM Trans. Graph..

[37] Thomas Brox,et al. Learning to Estimate 3D Hand Pose from Single RGB Images , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[38] Yaser Sheikh,et al. Hand Keypoint Detection in Single Images Using Multiview Bootstrapping , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39] Daniel Thalmann,et al. Robust 3D Hand Pose Estimation in Single Depth Images: From Single-View CNN to Multi-View CNNs , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41] Stefan Lee,et al. Lending A Hand: Detecting Hands and Recognizing Activities in Complex Egocentric Interactions , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[42] Antonis A. Argyros,et al. Scalable 3D Tracking of Multiple Interacting Objects , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[43] Antonis A. Argyros,et al. Tracking the articulated motion of two strongly interacting hands , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[44] Ying Wu,et al. Analyzing and capturing articulated hand motion in image sequences , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[45] William E. Lorensen,et al. Marching cubes: A high resolution 3D surface construction algorithm , 1987, SIGGRAPH.

[46] Dimitrios Tzionas,et al. Embodied hands , 2017, ACM Trans. Graph..