Multi-Modal Deep Hand Sign Language Recognition in Still Images Using Restricted Boltzmann Machine

In this paper, a deep learning approach, Restricted Boltzmann Machine (RBM), is used to perform automatic hand sign language recognition from visual data. We evaluate how RBM, as a deep generative model, is capable of generating the distribution of the input data for an enhanced recognition of unseen data. Two modalities, RGB and Depth, are considered in the model input in three forms: original image, cropped image, and noisy cropped image. Five crops of the input image are used and the hand of these cropped images are detected using Convolutional Neural Network (CNN). After that, three types of the detected hand images are generated for each modality and input to RBMs. The outputs of the RBMs for two modalities are fused in another RBM in order to recognize the output sign label of the input image. The proposed multi-modal model is trained on all and part of the American alphabet and digits of four publicly available datasets. We also evaluate the robustness of the proposal against noise. Experimental results show that the proposed multi-modal model, using crops and the RBM fusing methodology, achieves state-of-the-art results on Massey University Gesture Dataset 2012, American Sign Language (ASL). and Fingerspelling Dataset from the University of Surrey’s Center for Vision, Speech and Signal Processing, NYU, and ASL Fingerspelling A datasets.

[1]  Liang Chang,et al.  Hand3D: Hand Pose Estimation using 3D Neural Network , 2017, ArXiv.

[2]  Vassilis Athitsos,et al.  Evaluation of Deep Learning based Pose Estimation for Sign Language Recognition , 2016, PETRA.

[3]  Nicolas Pugeault,et al.  Spelling it out: Real-time ASL fingerspelling recognition , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[4]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Napoleon H. Reyes,et al.  A New 2D Static Hand Gesture Colour Image Dataset for ASL Gestures , 2011 .

[6]  Varun Jampani,et al.  Learning Inference Models for Computer Vision , 2017, ArXiv.

[7]  Hermann Ney,et al.  Deep Hand: How to Train a CNN on 1 Million Hand Images When Your Data is Continuous and Weakly Labelled , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Qi Ye,et al.  BigHand2.2M Benchmark: Hand Pose Dataset and State of the Art Analysis , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Debi Prosad Dogra,et al.  Coupled HMM-based multi-sensor data fusion for sign language recognition , 2017, Pattern Recognit. Lett..

[10]  Shreyashi Narayan Sawant Sign Language Recognition System to aid Deaf-dumb People Using PCA , 2014 .

[11]  Hermann Ney,et al.  Deep Learning of Mouth Shapes for Sign Language , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[12]  Gauri N. Zade,et al.  Sign Language Recognition For Deaf And Dumb People Using ANFIS , 2014 .

[13]  Fei Qiao,et al.  Region ensemble network: Improving convolutional network for hand pose estimation , 2017, 2017 IEEE International Conference on Image Processing (ICIP).

[14]  Nicolas Pugeault,et al.  Sign Language Recognition using Sequential Pattern Trees , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Fahad Ullah,et al.  American Sign Language recognition system for hearing impaired people using Cartesian Genetic Programming , 2011, The 5th International Conference on Automation, Robotics and Applications.

[16]  Ken Perlin,et al.  Real-Time Continuous Pose Recovery of Human Hands Using Convolutional Networks , 2014, ACM Trans. Graph..

[17]  Brandon Garcia,et al.  Real-time American Sign Language Recognition with Convolutional Neural Networks , 2022 .

[18]  Sergio Escalera,et al.  ChaLearn Looking at People Challenge 2014: Dataset and Results , 2014, ECCV Workshops.

[19]  Hermann Ney,et al.  Deep Sign: Hybrid CNN-HMM for Continuous Sign Language Recognition , 2016, BMVC.

[20]  Yichen Wei,et al.  Model-Based Deep Hand Pose Estimation , 2016, IJCAI.

[21]  B. Thilagavathi,et al.  Sign language recognition system , 2015, 2015 International Conference on Innovations in Information, Embedded and Communication Systems (ICIIECS).

[22]  Suharjito,et al.  Sign Language Recognition Application Systems for Deaf-Mute People: A Review Based on Input-Process-Output , 2017, ICCSCI.

[23]  S. Philomina,et al.  Hand Talk: Intelligent Sign Language Recognition for Deaf and Dumb , 2015 .

[24]  Geoffrey E. Hinton A Practical Guide to Training Restricted Boltzmann Machines , 2012, Neural Networks: Tricks of the Trade.

[25]  Michal Niedzwiecki,et al.  Hand Body Language Gesture Recognition Based on Signals From Specialized Glove and Machine Learning Algorithms , 2016, IEEE Transactions on Industrial Informatics.

[26]  Guijin Wang,et al.  Towards Good Practices for Deep 3D Hand Pose Estimation , 2017, ArXiv.

[27]  Zaid Omar,et al.  A review of hand gesture and sign language recognition techniques , 2017, International Journal of Machine Learning and Cybernetics.