Combining depth image and skeleton data from Kinect for recognizing words in the sign system for Indonesian language (SIBI [Sistem Isyarat Bahasa Indonesia])

The Sign System for Indonesian Language (SIBI) is a rather complex sign language. It has four components that distinguish the meaning of the sign language and it follows the syntax and the grammar of the Indonesian language. This paper proposes a model for recognizing the SIBI words by using Microsoft Kinect as the input sensor. This model is a part of automatic translation from SIBI to text. The features for each word are extracted from skeleton and color-depth data produced by Kinect. Skeleton data features indicate the angle between human joints and Cartesian axes. Color images are transformed to gray-scale and their features are extracted by using Discrete Cosine Transform (DCT) with Cross Correlation (CC) operation. The image's depth features are extracted by running MATLAB regionprops function to get its region properties. The Generalized Learning Vector Quantization (GLVQ) and Random Forest (RF) training algorithm from WEKA data mining tools are used as the classifier of the model. Several experiments with different scenarios have shown that the highest accuracy (96,67%) is obtained by using 30 frames for skeleton combined with 20 frames for region properties image classified by Random Forest.

[1]  Hafiz Imtiaz,et al.  A face recognition scheme based on spectral domain cross-correlation function , 2011, TENCON 2011 - 2011 IEEE Region 10 Conference.

[2]  Mohamed Jemni,et al.  3D Motion Trajectory Analysis Approach to Improve Sign Language 3D-based Content Recognition , 2012, INNS-WC.

[3]  Atsushi Shimada,et al.  Hash-based early recognition of gesture patterns , 2012, Artificial Life and Robotics.

[4]  Erdefi Rakun,et al.  Spectral domain cross correlation function and generalized Learning Vector Quantization for recognizing and classifying Indonesian Sign Language , 2012, 2012 International Conference on Advanced Computer Science and Information Systems (ICACSIS).

[5]  Atsushi Sato,et al.  Generalized Learning Vector Quantization , 1995, NIPS.

[6]  Okan Arikan Compression of motion capture databases , 2006, SIGGRAPH 2006.

[7]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[8]  Yasue Mitsukura,et al.  A robust gesture recognition based on depth data , 2013, The 19th Korea-Japan Joint Workshop on Frontiers of Computer Vision.

[9]  Ling Shao,et al.  Enhanced Computer Vision With Microsoft Kinect Sensor: A Review , 2013, IEEE Transactions on Cybernetics.

[10]  Clay M. Thompson,et al.  Image processing toolbox [for use with Matlab] , 1995 .

[11]  Thi-Lan Le,et al.  Temporal gesture segmentation for recognition , 2013, 2013 International Conference on Computing, Management and Telecommunications (ComManTel).

[12]  Vassilis Athitsos,et al.  Comparing gesture recognition accuracy using color and depth information , 2011, PETRA '11.

[13]  G.I.S. Ruas,et al.  Real-time video based finger spelling recognition system using low computational complexity Artificial Neural Networks , 2006, 2006 International Telecommunications Symposium.

[14]  Mohamed F. Tolba,et al.  A proposed PCNN features quality optimization technique for pose-invariant 3D Arabic sign language recognition , 2013, Appl. Soft Comput..

[15]  Debin Zhao,et al.  Hand gesture recognition based on skeleton of point clouds , 2012, 2012 IEEE Fifth International Conference on Advanced Computational Intelligence (ICACI).