SIA-GCN: A Spatial Information Aware Graph Neural Network with 2D Convolutions for Hand Pose Estimation

Graph Neural Networks (GNNs) generalize neural networks from applications on regular structures to applications on arbitrary graphs, and have shown success in many application domains such as computer vision, social networks and chemistry. In this paper, we extend GNNs along two directions: a) allowing features at each node to be represented by 2D spatial confidence maps instead of 1D vectors; and b) proposing an efficient operation to integrate information from neighboring nodes through 2D convolutions with different learnable kernels at each edge. The proposed SIA-GCN can efficiently extract spatial information from 2D maps at each node and propagate them through graph convolution. By associating each edge with a designated convolution kernel, the SIA-GCN could capture different spatial relationships for different pairs of neighboring nodes. We demonstrate the utility of SIA-GCN on the task of estimating hand keypoints from single-frame images, where the nodes represent the 2D coordinate heatmaps of keypoints and the edges denote the kinetic relationships between keypoints. Experiments on multiple datasets show that SIA-GCN provides a flexible and yet powerful framework to account for structural constraints between keypoints, and can achieve state-of-the-art performance on the task of hand pose estimation.

[1]  Christian Szegedy,et al.  DeepPose: Human Pose Estimation via Deep Neural Networks , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Luc Van Gool,et al.  Dense 3D Regression for Hand Pose Estimation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[3]  Christian Theobalt,et al.  Real-Time Hand Tracking Under Occlusion from an Egocentric RGB-D Sensor , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[4]  Varun Ramakrishna,et al.  Convolutional Pose Machines , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Dong Liu,et al.  Deep High-Resolution Representation Learning for Human Pose Estimation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Luc Van Gool,et al.  Thin-Slicing Network: A Deep Structured Model for Pose Estimation in Videos , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Yang Xiao,et al.  A2J: Anchor-to-Joint Regression Network for 3D Articulated Pose Estimation From a Single Depth Image , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[8]  Haoyu Ma,et al.  Adaptive Graphical Model Network for 2D Handpose Estimation , 2019, BMVC.

[9]  Tae-Kyun Kim,et al.  Augmented Skeleton Space Transfer for Depth-Based Hand Pose Estimation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[10]  Jia Deng,et al.  Stacked Hourglass Networks for Human Pose Estimation , 2016, ECCV.

[11]  Nadia Magnenat-Thalmann,et al.  Exploiting Spatial-Temporal Relationships for 3D Pose Estimation via Graph Convolutional Networks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[12]  Bardia Doosti,et al.  Hand Pose Estimation: A Survey , 2019, ArXiv.

[13]  Yaser Sheikh,et al.  Hand Keypoint Detection in Single Images Using Multiview Bootstrapping , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Christian Theobalt,et al.  HandVoxNet: Deep Voxel-Based Network for 3D Hand Shape and Pose Estimation From a Single Depth Map , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Charless C. Fowlkes,et al.  Predicting Camera Viewpoint Improves Cross-dataset Generalization for 3D Human Pose Estimation , 2020, ECCV Workshops.

[16]  Jianfei Cai,et al.  Weakly-Supervised 3D Hand Pose Estimation from Monocular RGB Images , 2018, ECCV.

[17]  Haoyu Ma,et al.  Nonparametric Structure Regularization Machine for 2D Hand Pose Estimation , 2020, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).

[18]  Alan L. Yuille,et al.  Articulated Pose Estimation by a Graphical Model with Image Dependent Pairwise Relations , 2014, NIPS.

[19]  Yu Tian,et al.  Semantic Graph Convolutional Networks for 3D Human Pose Regression , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Haoyu Ma,et al.  Rotation-invariant Mixed Graphical Model Network for 2D Hand Pose Estimation , 2020, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).

[21]  Daniel Thalmann,et al.  Robust 3D Hand Pose Estimation in Single Depth Images: From Single-View CNN to Multi-View CNNs , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Yichen Wei,et al.  Simple Baselines for Human Pose Estimation and Tracking , 2018, ECCV.

[23]  Dahua Lin,et al.  Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition , 2018, AAAI.

[24]  Xiaogang Wang,et al.  End-to-End Learning of Deformable Mixture of Parts and Deep Convolutional Neural Networks for Human Pose Estimation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Sergio Escalera,et al.  Depth-Based 3D Hand Pose Estimation: From Current Achievements to Future Goals , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[26]  David J. Crandall,et al.  HOPE-Net: A Graph-Based Model for Hand-Object Pose Estimation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Junsong Yuan,et al.  Point-to-Point Regression PointNet for 3D Hand Pose Estimation , 2018, ECCV.

[28]  Sergio Orts,et al.  Large-scale Multiview 3D Hand Pose Dataset , 2017, Image Vis. Comput..

[29]  Yebin Liu,et al.  Mask-Pose Cascaded CNN for 2D Hand Pose Estimation From Single Color Image , 2019, IEEE Transactions on Circuits and Systems for Video Technology.

[30]  Jonathan Tompson,et al.  Joint Training of a Convolutional Network and a Graphical Model for Human Pose Estimation , 2014, NIPS.

[31]  Luc Van Gool,et al.  Self-Supervised 3D Hand Pose Estimation Through Training by Fitting , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Pascal Frossard,et al.  The emerging field of signal processing on graphs: Extending high-dimensional data analysis to networks and other irregular domains , 2012, IEEE Signal Processing Magazine.

[33]  Thomas Brox,et al.  Learning to Estimate 3D Hand Pose from Single RGB Images , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[34]  Zhe Wang,et al.  Geometric Pose Affordance: 3D Human Pose with Scene Constraints , 2019, ArXiv.

[35]  Philip S. Yu,et al.  A Comprehensive Survey on Graph Neural Networks , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[36]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[37]  Gang Yu,et al.  Cascaded Pyramid Network for Multi-person Pose Estimation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.