Spelling Correction Real-Time American Sign Language Alphabet Translation System Based on YOLO Network and LSTM

In this paper, we present a novel approach that aims to solve one of the main challenges in hand gesture recognition tasks in static images, to compensate for the accuracy lost when trained models are used to interpret completely unseen data. The model presented here consists of two main data-processing stages. A deep neural network (DNN) for performing handshape segmentation and classification is used in which multiple architectures and input image sizes were tested and compared to derive the best model in terms of accuracy and processing time. For the experiments presented in this work, the DNN models were trained with 24,000 images of 24 signs from the American Sign Language alphabet and fine-tuned with 5200 images of 26 generated signs. The system was real-time tested with a community of 10 persons, yielding a mean average precision and processing rate of 81.74% and 61.35 frames-per-second, respectively. As a second data-processing stage, a bidirectional long short-term memory neural network was implemented and analyzed for adding spelling correction capability to our system, which scored a training accuracy of 98.07% with a dictionary of 370 words, thus, increasing the robustness in completely unseen data, as shown in our experiments.

[1]  Weihang Zhu,et al.  Dynamic Hand Gesture Recognition Based on a Leap Motion Controller and Two-Layer Bidirectional Recurrent Neural Network , 2020, Sensors.

[2]  Sergio L. Netto,et al.  A Comparative Analysis of Object Detection Metrics with a Companion Open-Source Toolkit , 2021, Electronics.

[3]  Federico Sandoval-Ibarra,et al.  American Sign Language Alphabet Recognition Using a Neuromorphic Sensor and an Artificial Neural Network , 2017, Sensors.

[4]  Seongjoo Lee,et al.  IMU Sensor-Based Hand Gesture Recognition for Human-Machine Interfaces , 2019, Sensors.

[5]  Ali Farhadi,et al.  YOLOv3: An Incremental Improvement , 2018, ArXiv.

[6]  Anikó Ekárt,et al.  British Sign Language Recognition via Late Fusion of Computer Vision and Leap Motion with Transfer Learning to American Sign Language , 2020, Sensors.

[7]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[8]  Weidong Min,et al.  New approach to vehicle license plate location based on new model YOLO-L and plate pre-identification , 2019, IET Image Process..

[9]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Renaud Rincent,et al.  BWGS: A R package for genomic selection and its application to a wheat breeding programme , 2020, PloS one.

[12]  Hee-Deok Yang,et al.  Sign Language Recognition with the Kinect Sensor Based on Conditional Random Fields , 2014, Sensors.

[13]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Sarfaraz Masood,et al.  American Sign Language Character Recognition Using Convolution Neural Network , 2018 .

[15]  Sergio Escalera,et al.  Multi-Modal Deep Hand Sign Language Recognition in Still Images Using Restricted Boltzmann Machine , 2018, Entropy.

[16]  Ognjan Luzanin,et al.  Hand gesture recognition using low-budget data glove and cluster-trained probabilistic neural network , 2014 .

[17]  Ming C. Leu,et al.  American Sign Language word recognition with a sensory glove using artificial neural networks , 2011, Eng. Appl. Artif. Intell..

[18]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Nanik Suciati,et al.  Indonesian Sign Language Recognition using YOLO Method , 2021 .

[20]  Wenjin Tao,et al.  American Sign Language alphabet recognition using Convolutional Neural Networks with multiview augmentation and inference fusion , 2018, Eng. Appl. Artif. Intell..

[21]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[22]  Gentiane Venture,et al.  Convolutional and recurrent neural network for human activity recognition: Application on American sign language , 2020, PloS one.