Speech recognition techniques for a sign language recognition system

One of the most significant differences between automatic sign language recognition (ASLR) and automatic speech recognition (ASR) is due to the computer vision problems, whereas the corresponding problems in speech signal processing have been solved due to intensive research in the last 30 years. We present our approach where we start from a large vocabulary speech recognition system to profit from the insights that have been obtained in ASR research. The system developed is able to recognize sentences of continuous sign language independent of the speaker. The features used are obtained from standard video cameras without any special data acquisition devices. In particular, we focus on feature and model combination techniques applied in ASR, and the usage of pronunciation and language models (LM) in sign language. These techniques can be used for all kind of sign language recognition systems, and for many video analysis problems where the temporal context is important, e.g. for action or gesture recognition. On a publicly available benchmark database consisting of 201 sentences and 3 signers, we can achieve a 17% WER.

[1]  Hermann Ney,et al.  Acoustic feature combination for robust speech recognition , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[2]  Hermann Ney,et al.  Deformation Models for Image Recognition , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Ankur Agarwal,et al.  Recovering 3D human pose from monocular images , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Xin Liu,et al.  Real Time Large Vocabulary Continuous Sign Language Recognition Based on OP/Viterbi Algorithm , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[5]  David Windridge,et al.  A Linguistic Feature Vector for the Visual Interpretation of Sign Language , 2004, ECCV.

[6]  W. Stokoe,et al.  A dictionary of American sign language on linguistic principles , 1965 .

[7]  Trevor Darrell,et al.  Hidden Conditional Random Fields for Gesture Recognition , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[8]  Hermann Ney,et al.  Tracking using dynamic programming for appearance-based sign language recognition , 2006, 7th International Conference on Automatic Face and Gesture Recognition (FGR06).

[9]  Ying Wu,et al.  Vision-Based Gesture Recognition: A Review , 1999, Gesture Workshop.

[10]  K. Emmorey,et al.  The Syntax of American Sign Language: Functional Categories and Hierarchical Structure by Carol Neidle et al. , 2000, Trends in Cognitive Sciences.

[11]  Dimitris N. Metaxas,et al.  A Framework for Recognizing the Simultaneous Aspects of American Sign Language , 2001, Comput. Vis. Image Underst..

[12]  H. Ney,et al.  Enhancements for local feature based image classification , 2004, ICPR 2004.

[13]  Dietrich Klakow,et al.  Testing the correlation of word error rate and perplexity , 2002, Speech Commun..

[14]  Georg Heigold,et al.  The 2006 RWTH parliamentary speeches transcription system , 2006, INTERSPEECH.

[15]  Hermann Ney,et al.  Enhancing a Sign Language Translation System with Vision-Based Features , 2009, Gesture Workshop.

[16]  Surendra Ranganath,et al.  Automatic Sign Language Analysis: A Survey and the Future beyond Lexical Meaning , 2005, IEEE Trans. Pattern Anal. Mach. Intell..