A Two-Stream Neural Network for Pose-Based Hand Gesture Recognition

Pose based hand gesture recognition has been widely studied in the recent years. Compared with full body action recognition, hand gesture involves joints that are more spatially closely distributed with stronger collaboration. This nature requires a different approach from action recognition to capturing the complex spatial features. Many gesture categories, such as “Grab” and “Pinch”, have very similar motion or temporal patterns posing a challenge on temporal processing. To address these challenges, this paper proposes a two-stream neural network with one stream being a self-attention based graph convolutional network (SAGCN) extracting the short-term temporal information and hierarchical spatial information, and the other being a residual-connection enhanced bidirectional Independently Recurrent Neural Network (RBi-IndRNN) for extracting long-term temporal information. The self-attention based graph convolutional network has a dynamic self-attention mechanism to adaptively exploit the relationships of all hand joints in addition to the fixed topology and local feature extraction in the GCN. On the other hand, the residual-connection enhanced Bi-IndRNN extends an IndRNN with the capability of bidirectional processing for temporal modelling. The two streams are fused together for recognition. The Dynamic Hand Gesture dataset and First-Person Hand Action dataset are used to validate its effectiveness, and our method achieves state-ofthe-art performance.

[1]  Wenjun Zeng,et al.  An End-to-End Spatio-Temporal Attention Model for Human Action Recognition from Skeleton Data , 2016, AAAI.

[2]  Yin Wang,et al.  Efficient Temporal Sequence Comparison and Classification Using Gram Matrix Embeddings on a Riemannian Manifold , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Xilin Chen,et al.  Continuous Gesture Recognition with Hand-Oriented Spatiotemporal Feature , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[4]  Ajmal Mian,et al.  3D Action Recognition from Novel Viewpoints , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Mohan M. Trivedi,et al.  Joint Angles Similarities and HOG2 for Action Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[6]  Jian-Huang Lai,et al.  Jointly Learning Heterogeneous Features for RGB-D Activity Recognition , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Junsong Yuan,et al.  Robust hand gesture recognition based on finger-earth mover's distance with a commodity depth camera , 2011, ACM Multimedia.

[8]  Luc Brun,et al.  A Neural Network Based on SPD Manifold Learning for Skeleton-Based Hand Gesture Recognition , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Wanqing Li,et al.  Deep Independently Recurrent Neural Network (IndRNN) , 2019, ArXiv.

[10]  Pavlo Molchanov,et al.  Online Detection and Classification of Dynamic Hand Gestures with Recurrent 3D Convolutional Neural Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Xiaohui Xie,et al.  Co-Occurrence Feature Learning for Skeleton Based Action Recognition Using Regularized Deep LSTM Networks , 2016, AAAI.

[12]  Bodo Rosenhahn,et al.  Real-Time Sign Language Recognition Using a Consumer Depth Camera , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[13]  Yikang Yang,et al.  Performance Comparison of Gesture Recognition System Based on Different Classifiers , 2021, IEEE Transactions on Cognitive and Developmental Systems.

[14]  Shanxin Yuan,et al.  First-Person Hand Action Benchmark with RGB-D Videos and 3D Hand Pose Annotations , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[15]  Wang Xi,et al.  Deep Learning for Hand Gesture Recognition on Skeletal Data , 2018, 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018).

[16]  Shuai Li,et al.  Bidirectional Independently Recurrent Neural Network for Skeleton-Based Hand Gesture Recognition , 2020, ISCAS.

[17]  Rongrong Ji,et al.  Deep Manifold Structure Transfer for Action Recognition , 2019, IEEE Transactions on Image Processing.

[18]  Guijin Wang,et al.  Motion feature augmented recurrent neural network for skeleton-based dynamic hand gesture recognition , 2017, 2017 IEEE International Conference on Image Processing (ICIP).

[19]  Dahua Lin,et al.  Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition , 2018, AAAI.

[20]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[21]  Zicheng Liu,et al.  HON4D: Histogram of Oriented 4D Normals for Activity Recognition from Depth Sequences , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Luc Van Gool,et al.  Building Deep Networks on Grassmann Manifolds , 2016, AAAI.

[23]  Pietro Zanuttigh,et al.  Hand gesture recognition with leap motion and kinect devices , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[24]  Yibo Li,et al.  Multi-View Hierarchical Bidirectional Recurrent Neural Network for Depth Video Sequence Based Action Recognition , 2018, Int. J. Pattern Recognit. Artif. Intell..

[25]  Juan Song,et al.  Learning Spatiotemporal Features Using 3DCNN and Convolutional LSTM for Gesture Recognition , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[26]  Guillermo Garcia-Hernando,et al.  Transition Forests: Learning Discriminative Temporal Transitions for Action Recognition and Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Andrew Zisserman,et al.  Convolutional Two-Stream Network Fusion for Video Action Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Zhaojie Ju,et al.  Multimodal Human Hand Motion Sensing and Analysis—A Review , 2019, IEEE Transactions on Cognitive and Developmental Systems.

[29]  Ebru Arisoy,et al.  Bidirectional recurrent neural network language models for automatic speech recognition , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[30]  Yong Li,et al.  Spatial temporal graph convolutional networks for skeleton-based dynamic hand gesture recognition , 2019, EURASIP J. Image Video Process..

[31]  Bruce A. Draper,et al.  Gesture Recognition: Focus on the Hands , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[32]  Lei Shi,et al.  Two-Stream Adaptive Graph Convolutional Networks for Skeleton-Based Action Recognition , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Hazem Wannous,et al.  Skeleton-Based Dynamic Hand Gesture Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[34]  Shuai Li,et al.  Independently Recurrent Neural Network (IndRNN): Building A Longer and Deeper RNN , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[35]  Rama Chellappa,et al.  Human Action Recognition by Representing 3D Skeletons as Points in a Lie Group , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[36]  Cristian Sminchisescu,et al.  The Moving Pose: An Efficient 3D Kinematics Descriptor for Low-Latency Action Recognition and Detection , 2013, 2013 IEEE International Conference on Computer Vision.

[37]  Luigi Cinque,et al.  Exploiting Recurrent Neural Networks and Leap Motion Controller for the Recognition of Sign Language and Semaphoric Hand Gestures , 2018, IEEE Transactions on Multimedia.

[38]  David Filliat,et al.  3D Hand Gesture Recognition Using a Depth and Skeletal Dataset , 2017, 3DOR@Eurographics.

[39]  Yong Du,et al.  Hierarchical recurrent neural network for skeleton based action recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Yuan He,et al.  Progressive Relation Learning for Group Activity Recognition , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Lu Yang,et al.  Survey on 3D Hand Gesture Recognition , 2016, IEEE Transactions on Circuits and Systems for Video Technology.

[42]  Whoi-Yul Kim,et al.  Skeleton-Based Dynamic Hand Gesture Recognition Using a Part-Based GRU-RNN for Gesture-Based Interface , 2020, IEEE Access.

[43]  Yang Liu,et al.  A Rapid Spiking Neural Network Approach With an Application on Hand Gesture Recognition , 2019, IEEE Transactions on Cognitive and Developmental Systems.

[44]  Xilin Chen,et al.  Hand Posture Recognition from Disparity Cost Map , 2012, ACCV.

[45]  Alberto Del Bimbo,et al.  Submitted to Ieee Transactions on Cybernetics 1 3d Human Action Recognition by Shape Analysis of Motion Trajectories on Riemannian Manifold , 2022 .

[46]  Luc Van Gool,et al.  A Riemannian Network for SPD Matrix Learning , 2016, AAAI.

[47]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..