A deep-learning-based multimodal depth-aware dynamic hand gesture recognition system

The dynamic hand gesture recognition task has seen studies on various unimodal and multimodal methods. Previously, researchers have explored depth and 2D-skeleton-based multimodal fusion CRNNs (Convolutional Recurrent Neural Networks) but have had limitations in getting expected recognition results. In this paper, we revisit this approach to hand gesture recognition and suggest several improvements. We observe that raw depth images possess low contrast in the hand regions of interest (ROI). They do not highlight important fine details, such as finger orientation, overlap between the finger and palm, or overlap between multiple fingers. We thus propose quantizing the depth values into several discrete regions, to create a higher contrast between several key parts of the hand. In addition, we suggest several ways to tackle the high variance problem in existing multimodal fusion CRNN architectures. We evaluate our method on two benchmarks: the DHG-14/28 dataset and the SHREC’ 17 track dataset. Our approach shows a significant improvement in accuracy and parameter efficiency over previous similar multimodal methods, with a comparable result to the state-of-the-art.

[1]  Xilin Chen,et al.  An Efficient PointLSTM for Point Clouds Based Gesture Recognition , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Juan José Pantrigo,et al.  Convolutional Neural Networks and Long Short-Term Memory for skeleton-based human activity and hand gesture recognition , 2018, Pattern Recognit..

[3]  Razvan C. Bunescu,et al.  Recognition of Dynamic Hand Gestures from 3D Motion Data Using LSTM and CNN Architectures , 2017, 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA).

[4]  Md. Kamrul Hasan,et al.  Recognition of Symbolic Gestures Using Depth Information , 2018, Adv. Hum. Comput. Interact..

[5]  Yasushi Yagi,et al.  Gesture recognition using colored gloves , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[6]  Svetlana Yanushkevich,et al.  An Ensemble of Knowledge Sharing Models for Dynamic Hand Gesture Recognition , 2020, 2020 International Joint Conference on Neural Networks (IJCNN).

[7]  Md. Kamrul Hasan,et al.  On-air English Capital Alphabet (ECA) recognition using depth information , 2021 .

[8]  Apurva A. Desai,et al.  Human Computer Interaction Through Hand Gestures for Home Automation Using Microsoft Kinect , 2017 .

[9]  David Filliat,et al.  3D Hand Gesture Recognition Using a Depth and Skeletal Dataset , 2017, 3DOR@Eurographics.

[10]  Frank Hutter,et al.  Decoupled Weight Decay Regularization , 2017, ICLR.

[11]  Pavlo Molchanov,et al.  Online Detection and Classification of Dynamic Hand Gestures with Recurrent 3D Convolutional Neural Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Guijin Wang,et al.  MFA-Net: Motion Feature Augmented Network for Dynamic Hand Gesture Recognition from Skeletal Data † , 2019, Sensors.

[13]  Svetlana N. Yanushkevich,et al.  CNN+RNN Depth and Skeleton based Dynamic Hand Gesture Recognition , 2018, 2018 24th International Conference on Pattern Recognition (ICPR).

[14]  Joanna Materzynska,et al.  The Jester Dataset: A Large-Scale Video Dataset of Human Gestures , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[15]  Quoc V. Le,et al.  Searching for Activation Functions , 2018, arXiv.

[16]  Jinguo Liu,et al.  Hand gesture recognition using multimodal data fusion and multiscale parallel convolutional neural network for human–robot interaction , 2020, Expert Syst. J. Knowl. Eng..

[17]  Hazem Wannous,et al.  Heterogeneous hand gesture recognition using 3D dynamic skeletal data , 2019, Comput. Vis. Image Underst..

[18]  Mark Pauly,et al.  Realtime performance-based facial animation , 2011, ACM Trans. Graph..

[19]  Jing Zhang,et al.  Deep Convolutional Neural Networks for Action Recognition Using Depth Map Sequences , 2015, ArXiv.

[20]  Luca Maria Gambardella,et al.  Max-pooling convolutional neural networks for vision-based hand gesture recognition , 2011, 2011 IEEE International Conference on Signal and Image Processing Applications (ICSIPA).

[21]  Gerhard Rigoll,et al.  Motion Fused Frames: Data Level Fusion Strategy for Hand Gesture Recognition , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[22]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[23]  Yi Zhu,et al.  Hidden Two-Stream Convolutional Networks for Action Recognition , 2017, ACCV.

[24]  Javaan S. Chahl,et al.  Hand Gesture Recognition Based on Computer Vision: A Review of Techniques , 2020, J. Imaging.

[25]  Dahua Lin,et al.  Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition , 2018, AAAI.

[26]  Xin Xu,et al.  Multimodal Gesture Recognition Based on the ResC3D Network , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[27]  Hazem Wannous,et al.  Skeleton-Based Dynamic Hand Gesture Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[28]  Sergio Escalera,et al.  ChaLearn Looking at People RGB-D Isolated and Continuous Datasets for Gesture Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[29]  Qiguang Miao,et al.  Review of dynamic gesture recognition , 2021, Virtual Real. Intell. Hardw..

[30]  Wenjin Tao,et al.  American Sign Language alphabet recognition using Convolutional Neural Networks with multiview augmentation and inference fusion , 2018, Eng. Appl. Artif. Intell..

[31]  Dimitris N. Metaxas,et al.  Construct Dynamic Graphs for Hand Gesture Recognition via Spatial-Temporal Attention , 2019, BMVC.