Deep Learning for Vietnamese Sign Language Recognition in Video Sequence

440 doi: 10.18178/ijmlc.2019.9.4.823 Abstract—With most of Vietnamese hearing impaired individuals, Vietnamese Sign Language (VSL) is the only choice for communication. Thus, there are more and more study about the automatic translation of VSL to make a bridge between hearing impaired people and normal ones. However, automatic VSL recognition in video brings many challenges due to the orientation of camera, hand position and movement, inter hand relation, etc. In this paper, we present some feature extraction approaches for VSL recognition including spatial and scene-based features. Instead of relying on a static image, we specifically capture motion information between frames in a video sequence. For the recognition task, beside the traditional method of sign language recognition such as SVM, we additionally propose to use deep learning technique for VSL recognition for finding the dependence of each frame in video sequences. We collected two VSL datasets of the relative family topic (VSL-WRF) like father, mother, uncle, aunt.... The first one includes 12 words in Vietnamese language which only have a little change between frames. While the second one contains 15 with gestures involving the relative position of the body parts and orientation of the motion. Moreover, the data augmentation technique is proposed to gain more information of hand movement and hand position. The experiments achieved the satisfactory results with accuracy of 88.5% (traditional SVM) and 95.83% (deep learning). It indicates that deep learning combining with data augmentation technique provides more information about the orientation or movement of hand, and it would be able to improve the performance of VSL recognition system.

[1]  Matti Pietikäinen,et al.  A comparative study of texture measures with classification based on featured distributions , 1996, Pattern Recognit..

[2]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[3]  Viet Nam Recognizing postures in vietnamese sign language with MEMS accelerometers , 2007 .

[4]  Benjamin Schrauwen,et al.  Sign Language Recognition Using Convolutional Neural Networks , 2014, ECCV Workshops.

[5]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[6]  Ville Ojansivu,et al.  Blur Insensitive Texture Classification Using Local Phase Quantization , 2008, ICISP.

[7]  Jean Meunier,et al.  Dynamic Gesture Classification for Vietnamese Sign Language Recognition , 2017 .

[8]  Trevor Darrell,et al.  Long-term recurrent convolutional networks for visual recognition and description , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Jürgen Schmidhuber,et al.  Learning to forget: continual prediction with LSTM , 1999 .

[10]  Jean Meunier,et al.  Recognizing vietnamese sign language based on rank matrix and alphabetic rules , 2015, 2015 International Conference on Advanced Technologies for Communications (ATC).

[11]  Gwen Littlewort,et al.  Real Time Face Detection and Facial Expression Recognition: Development and Applications to Human Computer Interaction. , 2003, 2003 Conference on Computer Vision and Pattern Recognition Workshop.

[12]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.