Spatial temporal graph convolutional networks for skeleton-based dynamic hand gesture recognition

Hand gesture recognition methods play an important role in human-computer interaction. Among these methods are skeleton-based recognition techniques that seem to be promising. In literature, several methods have been proposed to recognize hand gestures with skeletons. One problem with these methods is that they consider little the connectivity between the joints of a skeleton, constructing simple graphs for skeleton connectivity. Observing this, we built a new model of hand skeletons by adding three types of edges in the graph to finely describe the linkage action of joints. Then, an end-to-end deep neural network, hand gesture graph convolutional network, is presented in which the convolution is conducted only on linked skeleton joints. Since the training dataset is relatively small, this work proposes expanding the coordinate dimensionality so as to let models learn more semantic features. Furthermore, relative coordinates are employed to help hand gesture graph convolutional network learn the feature representation independent of the random starting positions of actions. The proposed method is validated on two challenging datasets, and the experimental results show that it outperforms the state-of-the-art methods. Furthermore, it is relatively lightweight in practice for hand skeleton-based gesture recognition.

[1]  Yongdong Zhang,et al.  A Fast Uyghur Text Detector for Complex Background Images , 2018, IEEE Transactions on Multimedia.

[2]  Dahua Lin,et al.  Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition , 2018, AAAI.

[3]  Tae-Kyun Kim,et al.  Opening the Black Box: Hierarchical Sampling Optimization for Estimating Human Hand Pose , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[4]  Hazem Wannous,et al.  3D Hand Gesture Recognition by Analysing Set-of-Joints Trajectories , 2016, UHA3DS@ICPR.

[5]  Yongdong Zhang,et al.  STAT: Spatial-Temporal Attention Mechanism for Video Captioning , 2020, IEEE Transactions on Multimedia.

[6]  Luigi Cinque,et al.  Exploiting Recurrent Neural Networks and Leap Motion Controller for the Recognition of Sign Language and Semaphoric Hand Gestures , 2018, IEEE Transactions on Multimedia.

[7]  Alex Graves,et al.  Long Short-Term Memory , 2020, Computer Vision.

[8]  Gregory D. Hager,et al.  Temporal Convolutional Networks for Action Segmentation and Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Qi Ye,et al.  Spatial Attention Deep Net with Partial PSO for Hierarchical Hybrid Hand Pose Estimation , 2016, ECCV.

[10]  Guijin Wang,et al.  Motion feature augmented recurrent neural network for skeleton-based dynamic hand gesture recognition , 2017, 2017 IEEE International Conference on Image Processing (ICIP).

[11]  Huazhong Yang,et al.  Spatial-Temporal Attention Res-TCN for Skeleton-Based Dynamic Hand Gesture Recognition , 2018, ECCV Workshops.

[12]  Hazem Wannous,et al.  Skeleton-Based Dynamic Hand Gesture Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[13]  Xiaodong Yang,et al.  EigenJoints-based action recognition using Naïve-Bayes-Nearest-Neighbor , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[14]  Guijin Wang,et al.  Towards Good Practices for Deep 3D Hand Pose Estimation , 2017, ArXiv.

[15]  Franck Multon,et al.  Dynamic hand gesture recognition based on 3D pattern assembled trajectories , 2017, 2017 Seventh International Conference on Image Processing Theory, Tools and Applications (IPTA).

[16]  Alexandre G. Ciancio,et al.  Quality of Experience in a Stereoscopic Multiview Environment , 2018, IEEE Transactions on Multimedia.

[17]  Christian Wolf,et al.  ModDrop: Adaptive Multi-Modal Gesture Recognition , 2014, IEEE Trans. Pattern Anal. Mach. Intell..

[18]  Guijin Wang,et al.  A novel hierarchical framework for human action recognition , 2016, Pattern Recognit..

[19]  Xudong Jiang,et al.  Deformable Pose Traversal Convolution for 3D Action and Gesture Recognition , 2018, ECCV.

[20]  Qionghai Dai,et al.  Cross-Modality Bridging and Knowledge Transferring for Image Understanding , 2019, IEEE Transactions on Multimedia.

[21]  Juan José Pantrigo,et al.  Convolutional Neural Networks and Long Short-Term Memory for skeleton-based human activity and hand gesture recognition , 2018, Pattern Recognit..

[22]  David Filliat,et al.  3D Hand Gesture Recognition Using a Depth and Skeletal Dataset , 2017, 3DOR@Eurographics.

[23]  Pavlo Molchanov,et al.  Online Detection and Classification of Dynamic Hand Gestures with Recurrent 3D Convolutional Neural Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).