A Deep Learning Approach for Analyzing Video and Skeletal Features in Sign Language Recognition

Sign language recognition (SLR) refers to the classification of signs with a specific meaning performed by the deaf and/or hearing-impaired people in their everyday communication. In this work, we propose a deep learning based framework, in which we examine and analyze the contribution of video (image and optical flow) and skeletal (body, hand and face) features in the challenging task of isolated SLR, in which each signed video corresponds to a single word. Moreover, we employ various fusion schemes in order to identify the optimal way to combine the information obtained from the various feature representations and propose a robust SLR methodology. Our experimentation on two sign language datasets and the comparison with state-of-the-art SLR methods reveals the superiority of optimally combining skeletal and video features for SLR tasks.

[1]  Changsheng Xu,et al.  Latent Support Vector Machine Modeling for Sign Language Recognition with Kinect , 2015, ACM Trans. Intell. Syst. Technol..

[2]  John B. Shoven,et al.  I , Edinburgh Medical and Surgical Journal.

[3]  Frederico G. Guimarães,et al.  Feature extraction in Brazilian Sign Language Recognition based on phonological structure and using RGB-D sensors , 2014, Expert Syst. Appl..

[4]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[5]  Alan W. C. Tan,et al.  A feature covariance matrix with serial particle filter for isolated sign language recognition , 2016, Expert Syst. Appl..

[6]  Peng Li,et al.  Signer-Independent Sign Language Recognition Based on Manifold and Discriminative Training , 2013, ICICA.

[7]  Surendra Ranganath,et al.  Signing Exact English (SEE): Modeling and recognition , 2008, Pattern Recognit..

[8]  Hermann Ney,et al.  RWTH-PHOENIX-Weather: A Large Vocabulary Sign Language Recognition and Translation Corpus , 2012, LREC.

[9]  Wen Gao,et al.  Large vocabulary sign language recognition based on fuzzy decision trees , 2004, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[10]  Nikolaos Grammalidis,et al.  Higher Order Linear Dynamical Systems for Smoke Detection in Video Surveillance Applications , 2017, IEEE Transactions on Circuits and Systems for Video Technology.

[11]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[12]  Nikos Grammalidis,et al.  Classification of Multidimensional Time-Evolving Data Using Histograms of Grassmannian Points , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[13]  Xiaoming Liu,et al.  On Geometric Features for Skeleton-Based Action Recognition Using Multilayer LSTM Networks , 2017, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[14]  Petros Daras,et al.  SIGN LANGUAGE RECOGNITION BASED ON HAND AND BODY SKELETAL DATA , 2018, 2018 - 3DTV-Conference: The True Vision - Capture, Transmission and Display of 3D Video (3DTV-CON).

[15]  Fei Huang,et al.  Hand Tracking Algorithm Based on SuperPixels Feature , 2013, 2013 International Conference on Information Science and Cloud Computing Companion.

[16]  Yoshiaki Shirai,et al.  Extraction of Hand Features for Recognition of Sign Language Words , 2002 .

[17]  Houqiang Li,et al.  Sign Language Recognition using 3D convolutional neural networks , 2015, 2015 IEEE International Conference on Multimedia and Expo (ICME).

[18]  Alan W. C. Tan,et al.  Block-based histogram of optical flow for isolated sign language recognition , 2016, J. Vis. Commun. Image Represent..

[19]  Michel Gourgand,et al.  Particle swarm optimization: A study of particle displacement for solving continuous and combinatorial optimization problems , 2009 .

[21]  Siddharth Swarup Rautaray,et al.  A Real Time Hand Tracking System for Interactive Applications , 2011 .

[22]  F. Wong,et al.  Hidden Markov Model-Based Gesture Recognition with Overlapping Hand-Head/Hand-Hand Estimated Using Kalman Filter , 2012, 2012 Third International Conference on Intelligent Systems Modelling and Simulation.