Improving Continuous Sign Language Recognition: Speech Recognition Techniques and System Design

Automatic sign language recognition (ASLR) is a special case of automatic speech recognition (ASR) and computer vision (CV) and is currently evolving from using artificial labgenerated data to using ’real-life’ data. Although ASLR still struggles with feature extraction, it can benefit from techniques developed for ASR. We present a large-vocabulary ASLR system that is able to recognize sentences in continuous sign language and uses features extracted from standard single-view video cameras without using additional equipment. ASR techniques such as the multi-layer-perceptron (MLP) tandem approach, speaker adaptation, pronunciation modelling, and parallel hidden Markov models are investigated. We evaluate the influence of each system component on the recognition performance. On two publicly available large vocabulary databases representing lab-data (25 signer, 455 sign vocabulary, 19k sentence) and unconstrained ’real-life’ sign language (1 signer, 266 sign vocabulary, 351 sentences) we can achieve 22.1% respectively 38.6% WER.

[1]  Hermann Ney,et al.  May the force be with you: Force-aligned signwriting for automatic subunit annotation of corpora , 2013, 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[2]  Thomas Hanke HamNoSys – Representing Sign Language Data in Language Resources and Language Processing Contexts , 2004 .

[3]  Surendra Ranganath,et al.  Automatic Sign Language Analysis: A Survey and the Future beyond Lexical Meaning , 2005, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Petros Maragos,et al.  Advances in phonetics-based sub-unit modeling for transcription alignment and sign language recognition , 2011, CVPR 2011 WORKSHOPS.

[5]  Ahmad Emami,et al.  Multi-class Model M , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[6]  Hermann Ney,et al.  Context-Dependent MLPs for LVCSR: TANDEM, Hybrid or Both? , 2012, INTERSPEECH.

[7]  Wen Gao,et al.  Expanding Training Set for Chinese Sign Language Recognition , 2006, 7th International Conference on Automatic Face and Gesture Recognition (FGR06).

[8]  Stan Sclaroff,et al.  Sign Language Spotting with a Threshold Model Based on Conditional Random Fields , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Hermann Ney,et al.  RWTH-PHOENIX-Weather: A Large Vocabulary Sign Language Recognition and Translation Corpus , 2012, LREC.

[10]  Moritz Knorr,et al.  The significance of facial features for automatic sign language recognition , 2008, 2008 8th IEEE International Conference on Automatic Face & Gesture Recognition.

[11]  Wei Du,et al.  Video analysis for continuous sign language recognition , 2010 .

[12]  Gerhard Rigoll,et al.  Novel Hybrid NN/HMM Modelling Techniques for On-line Handwriting Recognition , 2006 .

[13]  Dimitris N. Metaxas,et al.  A Framework for Recognizing the Simultaneous Aspects of American Sign Language , 2001, Comput. Vis. Image Underst..

[14]  Andreas Stolcke,et al.  SRILM at Sixteen: Update and Outlook , 2011 .

[15]  Ruiduo Yang,et al.  Handling Movement Epenthesis and Hand Segmentation Ambiguities in Continuous Sign Language Recognition Using Nested Dynamic Programming , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Stanley F. Chen,et al.  Enhanced word classing for model M , 2010, INTERSPEECH.

[17]  Hermann Ney,et al.  Tracking using dynamic programming for appearance-based sign language recognition , 2006, 7th International Conference on Automatic Face and Gesture Recognition (FGR06).

[18]  R. Bakis Continuous speech recognition via centisecond acoustic states , 1976 .

[19]  Cordelia Schmid,et al.  A Spatio-Temporal Descriptor Based on 3D-Gradients , 2008, BMVC.

[20]  Hermann Ney,et al.  Modality Combination Techniques for Continuous Sign Language Recognition , 2013, IbPRIA.

[21]  Sudeep Sarkar,et al.  Finding recurrent patterns from continuous sign language sentences for automated extraction of signs , 2012, J. Mach. Learn. Res..

[22]  F ChenStanley,et al.  An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[23]  Hermann Ney,et al.  Analysis, preparation, and optimization of statistical sign language machine translation , 2012, Machine Translation.

[24]  Hermann Ney,et al.  Speech recognition techniques for a sign language recognition system , 2007, INTERSPEECH.

[25]  Onno Crasborn,et al.  Workshop Proceedings: 5th Workshop on the Representation and Processing of Sign Languages: Interactions between Corpus and Lexicon , 2012, LREC 2012.

[26]  NaptaliWelly,et al.  Topic-Dependent-Class-Based $n$ -Gram Language Model , 2012 .

[27]  Hermann Ney,et al.  Acoustic feature combination for robust speech recognition , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[28]  Masatoshi Tsuchiya,et al.  Topic-Dependent-Class-Based $n$-Gram Language Model , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[29]  Hermann Ney,et al.  iCNC and iROVER: the limits of improving system combination with classification? , 2008, INTERSPEECH.

[30]  Andrew Zisserman,et al.  Learning sign language by watching TV (using weakly aligned subtitles) , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Helen Cooper,et al.  Learning signs from subtitles: A weakly supervised approach to sign language recognition , 2009, CVPR.

[32]  Richard Bowden,et al.  Learning signs from subtitles: A weakly supervised approach to sign language recognition , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[33]  Nicolas Pugeault,et al.  Sign Language Recognition using Sequential Pattern Trees , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  Nicolas Pugeault,et al.  Sign language recognition using sub-units , 2012, J. Mach. Learn. Res..