3D sign language recognition with joint distance and angular coded color topographical descriptor on a 2 - stream CNN

Abstract Currently, one of the challenging and most interesting human action recognition (HAR) problems is the 3D sign language recognition problem. The sign in the 3D video can be characterized in the form of 3D joint location information in 3D space over time. Therefore, the objective of this study is to construct a color coded topographical descriptor from joint distances and angles computed from joint locations. We call these two color coded images the joint distance topographic descriptor (JDTD) and joint angle topographical descriptor (JATD) respectively. For the classification we propose a two stream convolutional neural network (2CNN) architecture, which takes as input the color-coded images JDTD and JATD. The two independent streams were merged and concatenated together with features from both streams in the dense layer. For a given query 3D sign (or action), a list of class scores was obtained as a text label corresponding to the sign. The results showed improvement in classifier performance over the predecessors due to the mixing of distance and angular features for predicting closely related spatio temporal discriminative features. To benchmark the performance of our proposed model, we compared our results with the state-of-the-art baseline action recognition frameworks by using our own 3D sign language dataset and two publicly available 3D mocap action datasets, namely, HDM05 and CMU.

[1]  Xiaohui Xie,et al.  Co-Occurrence Feature Learning for Skeleton Based Action Recognition Using Regularized Deep LSTM Networks , 2016, AAAI.

[2]  Wei Xu,et al.  CNN-RNN: A Unified Framework for Multi-label Image Classification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Qiuqi Ruan,et al.  Hierarchical and Spatio-Temporal Sparse Representation for Human Action Recognition , 2018, IEEE Transactions on Image Processing.

[4]  Yuan Li,et al.  Deep attention network for joint hand gesture localization and recognition using static RGB-D images , 2018, Inf. Sci..

[5]  Georgios Evangelidis,et al.  Skeletal Quads: Human Action Recognition Using Joint Quadruples , 2014, 2014 22nd International Conference on Pattern Recognition.

[6]  Hamidreza Rashidy Kanan,et al.  Saliency based alphabet and numbers of American sign language recognition using linear feature extraction , 2014, 2014 4th International Conference on Computer and Knowledge Engineering (ICCKE).

[7]  Debi Prosad Dogra,et al.  Coupled HMM-based multi-sensor data fusion for sign language recognition , 2017, Pattern Recognit. Lett..

[8]  Shohreh Kasaei,et al.  Skeleton-based Human Action Recognition - A Learning Method based on Active Joints , 2017, VISIGRAPP.

[9]  Luca Maria Gambardella,et al.  Flexible, High Performance Convolutional Neural Networks for Image Classification , 2011, IJCAI.

[10]  Vittorio Murino,et al.  When Kernel Methods Meet Feature Learning: Log-Covariance Network for Action Recognition From Skeletal Data , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[11]  P. V. V. Kishore,et al.  Visual-verbal machine interpreter for sign language recognition under versatile video backgrounds , 2014, 2014 First International Conference on Networks & Soft Computing (ICNSC2014).

[12]  Javed Imran,et al.  Combining CNN streams of RGB-D and skeletal data for human activity recognition , 2018, Pattern Recognit. Lett..

[13]  Bin Yu,et al.  Feature learning based on SAE-PCA network for human gesture recognition in RGBD images , 2015, Neurocomputing.

[14]  Anupam Agrawal,et al.  Vision based hand gesture recognition for human computer interaction: a survey , 2012, Artificial Intelligence Review.

[15]  Luis A. Guerrero,et al.  Automatic recognition of the American sign language fingerspelling alphabet to assist people living with speech or hearing impairments , 2017, J. Ambient Intell. Humaniz. Comput..

[16]  Mariusz Flasinski,et al.  On the use of graph parsing for recognition of isolated hand postures of Polish Sign Language , 2010, Pattern Recognit..

[17]  Oscar Koller,et al.  Using Convolutional 3D Neural Networks for User-independent continuous gesture recognition , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[18]  Nasrollah Moghaddam Charkari,et al.  Survey on deep learning methods in human action recognition , 2017, IET Comput. Vis..

[19]  Ali Karami,et al.  Persian sign language (PSL) recognition using wavelet transform and neural networks , 2011, Expert Syst. Appl..

[20]  Houqiang Li,et al.  Sign Language Recognition using 3D convolutional neural networks , 2015, 2015 IEEE International Conference on Multimedia and Expo (ICME).

[21]  Jakkree Srinonchat,et al.  Hand Gesture Recognition for Thai Sign Language in Complex Background Using Fusion of Depth and Color Video , 2016 .

[22]  Liang Wang,et al.  Beyond Joints: Learning Representations From Primitive Geometries for Skeleton-Based Action Recognition and Detection , 2018, IEEE Transactions on Image Processing.

[23]  P. V. V. Kishore,et al.  Conglomeration of Hand Shapes and Texture Information for Recognizing Gestures of Indian Sign Language Using Feed forward Neural Networks , 2013 .

[24]  Chi-Man Pun,et al.  Human action recognition with skeletal information from depth camera , 2013, 2013 IEEE International Conference on Information and Automation (ICIA).

[25]  Sandra Hirche,et al.  Invariance Control for Safe Human–Robot Interaction in Dynamic Environments , 2017, IEEE Transactions on Robotics.

[26]  Basma Hisham,et al.  Arabic Static and Dynamic Gestures Recognition Using Leap Motion , 2017, J. Comput. Sci..

[27]  Ching-Hua Chuan,et al.  American Sign Language Recognition Using Leap Motion Sensor , 2014, 2014 13th International Conference on Machine Learning and Applications.

[28]  Vassilis Athitsos,et al.  Evaluation of Deep Learning based Pose Estimation for Sign Language Recognition , 2016, PETRA.

[29]  Mehrtash Tafazzoli Harandi,et al.  Going deeper into action recognition: A survey , 2016, Image Vis. Comput..

[30]  Debi Prosad Dogra,et al.  A multimodal framework for sensor based sign language recognition , 2017, Neurocomputing.

[31]  Gang Wang,et al.  NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Fei-Fei Li,et al.  Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[33]  Chalavadi Krishna Mohan,et al.  Human Action Recognition Based on MOCAP Information Using Convolution Neural Networks , 2014, 2014 13th International Conference on Machine Learning and Applications.

[34]  Greg Mori,et al.  A Hierarchical Deep Temporal Model for Group Activity Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  E. Kiran Kumar,et al.  Training CNNs for 3-D Sign Language Recognition With Color Texture Coded Joint Angular Displacement Maps , 2018, IEEE Signal Processing Letters.

[36]  Hong Liu,et al.  Enhanced skeleton visualization for view invariant human action recognition , 2017, Pattern Recognit..

[37]  Benjamin Schrauwen,et al.  Sign Language Recognition Using Convolutional Neural Networks , 2014, ECCV Workshops.

[38]  Xilin Chen,et al.  Two streams Recurrent Neural Networks for Large-Scale Continuous Gesture Recognition , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[39]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[40]  Frederico G. Guimarães,et al.  Feature extraction in Brazilian Sign Language Recognition based on phonological structure and using RGB-D sensors , 2014, Expert Syst. Appl..

[41]  Mathieu Barnachon,et al.  Human actions recognition from streamed Motion Capture , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[42]  Sergio Escalera,et al.  Probability-based Dynamic Time Warping and Bag-of-Visual-and-Depth-Words for Human Gesture Recognition in RGB-D , 2014, Pattern Recognit. Lett..

[43]  Javier Macías Guarasa,et al.  Speech to sign language translation system for Spanish , 2008, Speech Commun..

[44]  Debi Prosad Dogra,et al.  A position and rotation invariant framework for sign language recognition (SLR) using Kinect , 2018, Multimedia Tools and Applications.

[45]  Santiago-Omar Caballero-Morales,et al.  3D Modeling of the Mexican Sign Language for a Speech-to-Sign Language System , 2013 .

[46]  Shuang Wang,et al.  Skeleton-based action recognition using LSTM and CNN , 2017, 2017 IEEE International Conference on Multimedia & Expo Workshops (ICMEW).

[47]  Mohamed A. Deriche,et al.  Arabic sign language recognition using the leap motion controller , 2014, 2014 IEEE 23rd International Symposium on Industrial Electronics (ISIE).

[48]  PorikliFatih,et al.  Going deeper into action recognition , 2017 .

[49]  Shih-Fu Chang,et al.  Temporal Action Localization in Untrimmed Videos via Multi-stage CNNs , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Cordelia Schmid,et al.  Multi-region Two-Stream R-CNN for Action Detection , 2016, ECCV.

[51]  Meng Wang,et al.  Multimodal Deep Autoencoder for Human Pose Recovery , 2015, IEEE Transactions on Image Processing.

[52]  Charlotte J. Evans Sign Language Research Contributes to a Better Understanding of Language Acquisition, A Review of Directions in Sign Language Acquisition , 2004 .

[53]  Yong Du,et al.  Hierarchical recurrent neural network for skeleton based action recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[54]  M. F. Tolba,et al.  Arabic sign language recognition using leap motion sensor , 2014, 2014 9th International Conference on Computer Engineering & Systems (ICCES).

[55]  Gang Wang,et al.  Skeleton-Based Human Action Recognition With Global Context-Aware Attention LSTM Networks , 2017, IEEE Transactions on Image Processing.

[56]  N. Toadithep,et al.  3D Animation Editor and Display Sign Language System case study: Thai Sign Language , 2010, 2010 3rd International Conference on Computer Science and Information Technology.

[57]  Gang Wang,et al.  Large-Margin Multi-Modal Deep Learning for RGB-D Object Recognition , 2015, IEEE Transactions on Multimedia.

[58]  Pichao Wang,et al.  Joint Distance Maps Based Action Recognition With Convolutional Neural Networks , 2017, IEEE Signal Processing Letters.

[59]  Juan Song,et al.  Multimodal Gesture Recognition Using 3-D Convolution and Convolutional LSTM , 2017, IEEE Access.

[60]  Marc'Aurelio Ranzato,et al.  Large Scale Distributed Deep Networks , 2012, NIPS.

[61]  Marly Guimarães Fernandes Costa,et al.  A fully automatic method for recognizing hand configurations of Brazilian sign language , 2017 .

[62]  Gang Wang,et al.  Spatio-Temporal LSTM with Trust Gates for 3D Human Action Recognition , 2016, ECCV.

[63]  P. V. V. Kishore,et al.  4-Camera model for sign language recognition using elliptical fourier descriptors and ANN , 2015, 2015 International Conference on Signal Processing and Communication Engineering Systems.

[64]  Felipe Trujillo-Romero,et al.  Modelado 3D del lenguaje de señas mexicano para un sistema de voz-a-lenguaje de señas , 2013, Computación y Sistemas.

[65]  Hee-Deok Yang,et al.  Sign Language Recognition with the Kinect Sensor Based on Conditional Random Fields , 2014, Sensors.

[66]  Derek Ho,et al.  Glove-based hand gesture recognition sign language translator using capacitive touch sensor , 2016, 2016 IEEE International Conference on Electron Devices and Solid-State Circuits (EDSSC).

[67]  E. Kiran Kumar,et al.  Motionlets Matching With Adaptive Kernels for 3-D Indian Sign Language Recognition , 2018, IEEE Sensors Journal.

[68]  Jie Huang,et al.  Sign language recognition using real-sense , 2015, 2015 IEEE China Summit and International Conference on Signal and Information Processing (ChinaSIP).

[69]  Truong Q. Nguyen,et al.  Real-time sign language fingerspelling recognition using convolutional neural networks from depth map , 2015, 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR).