Real-Time Monocular Skeleton-Based Hand Gesture Recognition Using 3D-Jointsformer

Automatic hand gesture recognition in video sequences has widespread applications, ranging from home automation to sign language interpretation and clinical operations. The primary challenge lies in achieving real-time recognition while managing temporal dependencies that can impact performance. Existing methods employ 3D convolutional or Transformer-based architectures with hand skeleton estimation, but both have limitations. To address these challenges, a hybrid approach that combines 3D Convolutional Neural Networks (3D-CNNs) and Transformers is proposed. The method involves using a 3D-CNN to compute high-level semantic skeleton embeddings, capturing local spatial and temporal characteristics of hand gestures. A Transformer network with a self-attention mechanism is then employed to efficiently capture long-range temporal dependencies in the skeleton sequence. Evaluation of the Briareo and Multimodal Hand Gesture datasets resulted in accuracy scores of 95.49% and 97.25%, respectively. Notably, this approach achieves real-time performance using a standard CPU, distinguishing it from methods that require specialized GPUs. The hybrid approach’s real-time efficiency and high accuracy demonstrate its superiority over existing state-of-the-art methods. In summary, the hybrid 3D-CNN and Transformer approach effectively addresses real-time recognition challenges and efficient handling of temporal dependencies, outperforming existing methods in both accuracy and speed.

[1]  Kyeongbo Kong,et al.  Dynamic Hand Gesture Recognition Using Improved Spatio-Temporal Graph Convolutional Network , 2022, IEEE Transactions on Circuits and Systems for Video Technology.

[2]  Q. Miao,et al.  Multi-Scale Attention 3D Convolutional Network for Multimodal Gesture Recognition , 2022, Sensors.

[3]  Md. Sajjatul Islam,et al.  Multi-model ensemble gesture recognition network for high-accuracy dynamic hand gesture recognition , 2022, Journal of Ambient Intelligence and Humanized Computing.

[4]  Jian Cheng,et al.  STA-GCN: two-stream graph convolutional network with spatial–temporal attention for hand gesture recognition , 2020, Vis. Comput..

[5]  Ahmed M. Soliman,et al.  Smart healthcare solutions using the internet of medical things for hand gesture recognition system , 2020, Complex & Intelligent Systems.

[6]  Rita Cucchiara,et al.  Multimodal Hand Gesture Classification for the Human-Car Interaction , 2020, Informatics.

[7]  M. Matteucci,et al.  Skeleton-based action recognition via spatial and temporal transformer networks , 2020, Comput. Vis. Image Underst..

[8]  Fernando Jaureguizar,et al.  A real-time gesture recognition system using near-infrared imagery , 2019, PloS one.

[9]  Yong Li,et al.  Spatial temporal graph convolutional networks for skeleton-based dynamic hand gesture recognition , 2019, EURASIP Journal on Image and Video Processing.

[10]  Seongjoo Lee,et al.  IMU Sensor-Based Hand Gesture Recognition for Human-Machine Interfaces , 2019, Sensors.

[11]  Hazem Wannous,et al.  Heterogeneous hand gesture recognition using 3D dynamic skeletal data , 2019, Comput. Vis. Image Underst..

[12]  Ying Sun,et al.  Surface EMG hand gesture recognition system based on PCA and GRNN , 2019, Neural Computing and Applications.

[13]  Guijin Wang,et al.  MFA-Net: Motion Feature Augmented Network for Dynamic Hand Gesture Recognition from Skeletal Data † , 2019, Sensors.

[14]  Lianwen Jin,et al.  Skeleton-based Gesture Recognition Using Several Fully Connected Layers with Path Signature Features and Temporal Transformer Module , 2018, AAAI.

[15]  Juan José Pantrigo,et al.  Convolutional Neural Networks and Long Short-Term Memory for skeleton-based human activity and hand gesture recognition , 2018, Pattern Recognit..

[16]  Boon Giin Lee,et al.  Smart Wearable Hand Device for Sign Language Interpretation System With Sensors Fusion , 2018, IEEE Sensors Journal.

[17]  Dahua Lin,et al.  Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition , 2018, AAAI.

[18]  Woontack Woo,et al.  Metaphoric Hand Gestures for Orientation-Aware VR Object Manipulation With an Egocentric Viewpoint , 2017, IEEE Transactions on Human-Machine Systems.

[19]  Sajid H. Sadi,et al.  Non-invasive optical detection of hand gestures , 2015, AH.

[20]  Emil M. Petriu,et al.  Dynamic Sign Language Recognition for Smart Home Interactive Application Using Stochastic Linear Formal Grammar , 2015, IEEE Transactions on Instrumentation and Measurement.

[21]  Mohan M. Trivedi,et al.  Hand Gesture Recognition in Real Time for Automotive Interfaces: A Multimodal Vision-Based Approach and Evaluations , 2014, IEEE Transactions on Intelligent Transportation Systems.

[22]  Frank Weichert,et al.  Analysis of the Accuracy and Robustness of the Leap Motion Controller , 2013, Sensors.

[23]  Zhengyou Zhang,et al.  Microsoft Kinect Sensor and Its Effect , 2012, IEEE Multim..

[24]  A. P. Mazumdar,et al.  SBI-DHGR: Skeleton-based intelligent dynamic hand gestures recognition , 2023, Expert Systems with Applications.

[25]  Nicu Sebe,et al.  Image Analysis and Processing – ICIAP 2019: 20th International Conference, Trento, Italy, September 9–13, 2019, Proceedings, Part I , 2019, ICIAP.