A feature covariance matrix with serial particle filter for isolated sign language recognition

A fusion of median and mode filtering for better background model.A serial particle filter that can better detect and track the object of interest.A novel covariance matrix feature for isolated sign language representation. As is widely recognized, sign language recognition is a very challenging visual recognition problem. In this paper, we propose a feature covariance matrix based serial particle filter for isolated sign language recognition. At the preprocessing stage, the fusion of the median and mode filters is employed to extract the foreground and thereby enhances hand detection. We propose to serially track the hands of the signer, as opposed to tracking both hands at the same time, to reduce the misdirection of target objects. Subsequently, the region around the tracked hands is extracted to generate the feature covariance matrix as a compact representation of the tracked hand gesture, and thereby reduce the dimensionality of the features. In addition, the proposed feature covariance matrix is able to adapt to new signs due to its ability to integrate multiple correlated features in a natural way, without any retraining process. The experimental results show that the hand trajectories as obtained through the proposed serial hand tracking are closer to the ground truth. The sign gesture recognition based on the proposed methods yields a 87.33% recognition rate for the American Sign Language. The proposed hand tracking and feature extraction methodology is an important milestone in the development of expert systems designed for sign language recognition, such as automated sign language translation systems.

[1]  Larry S. Davis,et al.  Fast multiple object tracking via a hierarchical particle filter , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[2]  Houqiang Li,et al.  A new system for Chinese sign language recognition , 2015, 2015 IEEE China Summit and International Conference on Signal and Information Processing (ChinaSIP).

[3]  Peng Li,et al.  Signer-Independent Sign Language Recognition Based on Manifold and Discriminative Training , 2013, ICICA.

[4]  Quming Zhou,et al.  Tracking and Classifying Moving Objects from Video , 2001 .

[5]  Huchuan Lu,et al.  This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. IEEE TRANSACTIONS ON IMAGE PROCESSING 1 Online Object Tracking with Sparse Prototypes , 2022 .

[6]  Z. Zenn Bien,et al.  A dynamic gesture recognition system for the Korean sign language (KSL) , 1996, IEEE Trans. Syst. Man Cybern. Part B.

[7]  Jiye Zhang,et al.  Improvement of Chinese sign language translation system based on multi-node micro inertial measurement unit , 2015, 2015 IEEE International Conference on Cyber Technology in Automation, Control, and Intelligent Systems (CYBER).

[8]  Narendra Ahuja,et al.  Extraction of 2D Motion Trajectories and Its Application to Hand Gesture Recognition , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  N. Otsu A threshold selection method from gray level histograms , 1979 .

[10]  Luc Van Gool,et al.  An adaptive color-based particle filter , 2003, Image Vis. Comput..

[11]  Heung-Il Suk,et al.  Hand gesture recognition based on dynamic Bayesian network framework , 2010, Pattern Recognit..

[12]  Hermann Ney,et al.  Tracking Benchmark Databases for Video-Based Sign Language Recognition , 2010, ECCV Workshops.

[13]  Wen Gao,et al.  Large vocabulary sign language recognition based on fuzzy decision trees , 2004, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[14]  David Windridge,et al.  A Linguistic Feature Vector for the Visual Interpretation of Sign Language , 2004, ECCV.

[15]  Surendra Ranganath,et al.  Signing Exact English (SEE): Modeling and recognition , 2008, Pattern Recognit..

[16]  James W. Davis,et al.  The Recognition of Human Movement Using Temporal Templates , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[17]  Neil J. Gordon,et al.  A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking , 2002, IEEE Trans. Signal Process..

[18]  Kirsti Grobel,et al.  Video-Based Sign Language Recognition Using Hidden Markov Models , 1997, Gesture Workshop.

[19]  Fei Huang,et al.  Hand Tracking Algorithm Based on SuperPixels Feature , 2013, 2013 International Conference on Information Science and Cloud Computing Companion.

[20]  Geoffrey E. Hinton,et al.  Glove-Talk: a neural network interface between a data-glove and a speech synthesizer , 1993, IEEE Trans. Neural Networks.

[21]  Hermann Ney,et al.  Combination of Tangent Distance and an Image Distortion Model for Appearance-Based Sign Language Recognition , 2005, DAGM-Symposium.

[22]  Richard Bowden,et al.  Scene Particles: Unregularized Particle-Based Scene Flow Estimation , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Vassilis Athitsos,et al.  Sign language recognition using dynamic time warping and hand shape distance based on histogram of oriented gradient features , 2014, PETRA.

[24]  Houqiang Li,et al.  Sign Language Recognition using 3D convolutional neural networks , 2015, 2015 IEEE International Conference on Multimedia and Expo (ICME).

[25]  Anupam Agrawal,et al.  Vision based hand gesture recognition for human computer interaction: a survey , 2012, Artificial Intelligence Review.

[26]  James J. Little,et al.  A Boosted Particle Filter: Multitarget Detection and Tracking , 2004, ECCV.

[27]  N. Gordon,et al.  Novel approach to nonlinear/non-Gaussian Bayesian state estimation , 1993 .

[28]  Rita Cucchiara,et al.  Detecting Moving Objects, Ghosts, and Shadows in Video Streams , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[29]  Sergio A. Velastin,et al.  Automatic congestion detection system for underground platforms , 2001, Proceedings of 2001 International Symposium on Intelligent Multimedia, Video and Speech Processing. ISIMP 2001 (IEEE Cat. No.01EX489).

[30]  Luc Van Gool,et al.  Object Tracking with an Adaptive Color-Based Particle Filter , 2002, DAGM-Symposium.

[31]  Chung-Lin Huang,et al.  Hand gesture recognition using a real-time tracking method and hidden Markov models , 2003, Image Vis. Comput..

[32]  Changsheng Xu,et al.  Latent Support Vector Machine Modeling for Sign Language Recognition with Kinect , 2015, ACM Trans. Intell. Syst. Technol..

[33]  Frederico G. Guimarães,et al.  Feature extraction in Brazilian Sign Language Recognition based on phonological structure and using RGB-D sensors , 2014, Expert Syst. Appl..

[34]  Jie Huang,et al.  Sign language recognition using real-sense , 2015, 2015 IEEE China Summit and International Conference on Signal and Information Processing (ChinaSIP).

[35]  Bülent Sankur,et al.  SignTutor: An Interactive System for Sign Language Tutoring , 2009, IEEE Multimedia.

[36]  Shan Lu,et al.  Color-based hands tracking system for sign language recognition , 1998, Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition.

[37]  Alex Pentland,et al.  Real-time American Sign Language recognition from video using hidden Markov models , 1995 .

[38]  Charles Markham,et al.  Weakly Supervised Training of a Sign Language Recognition System Using Multiple Instance Learning Density Matrices , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[39]  Wen Gao,et al.  A SRN/HMM system for signer-independent continuous sign language recognition , 2002, Proceedings of Fifth IEEE International Conference on Automatic Face Gesture Recognition.

[40]  Rashid Ansari,et al.  Kernel particle filter for visual tracking , 2005, IEEE Signal Processing Letters.

[41]  Marek Hrúz,et al.  Sign-language-enabled information kiosk , 2009 .

[42]  Ali Karami,et al.  Persian sign language (PSL) recognition using wavelet transform and neural networks , 2011, Expert Syst. Appl..

[43]  Tieniu Tan,et al.  Real-time hand tracking using a mean shift embedded particle filter , 2007, Pattern Recognit..

[44]  R. Venkatesh Babu,et al.  Recognition of human actions using motion history information extracted from the compressed video , 2004, Image Vis. Comput..

[45]  Clément Chatelain,et al.  Hand Tracking Using Optical-Flow Embedded Particle Filter in Sign Language Scenes , 2012, ICCVG.

[46]  Siddharth Swarup Rautaray,et al.  A Real Time Hand Tracking System for Interactive Applications , 2011 .

[47]  F. Wong,et al.  Hidden Markov Model-Based Gesture Recognition with Overlapping Hand-Head/Hand-Hand Estimated Using Kalman Filter , 2012, 2012 Third International Conference on Intelligent Systems Modelling and Simulation.

[48]  Stan Sclaroff,et al.  The American Sign Language Lexicon Video Dataset , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[49]  Ming Ouhyoung,et al.  A real-time continuous gesture recognition system for sign language , 1998, Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition.

[50]  W. Förstner,et al.  A Metric for Covariance Matrices , 2003 .

[51]  Yung-Hui Lee,et al.  Taiwan sign language (TSL) recognition based on 3D data and neural networks , 2009, Expert Syst. Appl..

[52]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[53]  Munib Qutaishat,et al.  American sign language (ASL) recognition based on Hough transform and neural networks , 2007, Expert Syst. Appl..

[54]  C.R. Hema,et al.  Extraction of head and hand gesture features for recognition of sign language , 2008, 2008 International Conference on Electronic Design.

[55]  Ming-Hsuan Yang,et al.  Incremental Learning for Robust Visual Tracking , 2008, International Journal of Computer Vision.

[56]  Dimitris N. Metaxas,et al.  ASL recognition based on a coupling between HMMs and 3D motion analysis , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[57]  Pinaki Sankar Chatterjee,et al.  An Approach for Minimizing the Time Taken by Video Processing for Translating Sign Language to Simple Sentence in English , 2015, 2015 International Conference on Computational Intelligence and Networks.