Recent developments in visual sign language recognition

Research in the field of sign language recognition has made significant advances in recent years. The present achievements provide the basis for future applications with the objective of supporting the integration of deaf people into the hearing society. Translation systems, for example, could facilitate communication between deaf and hearing people in public situations. Further applications, such as user interfaces and automatic indexing of signed videos, become feasible. The current state in sign language recognition is roughly 30 years behind speech recognition, which corresponds to the gradual transition from isolated to continuous recognition for small vocabulary tasks. Research efforts were mainly focused on robust feature extraction or statistical modeling of signs. However, current recognition systems are still designed for signer-dependent operation under laboratory conditions. This paper describes a comprehensive concept for robust visual sign language recognition, which represents the recent developments in this field. The proposed recognition system aims for signer-independent operation and utilizes a single video camera for data acquisition to ensure user-friendliness. Since sign languages make use of manual and facial means of expression, both channels are employed for recognition. For mobile operation in uncontrolled environments, sophisticated algorithms were developed that robustly extract manual and facial features. The extraction of manual features relies on a multiple hypotheses tracking approach to resolve ambiguities of hand positions. For facial feature extraction, an active appearance model is applied which allows identification of areas of interest such as the eyes and mouth region. In the next processing step, a numerical description of the facial expression, head pose, line of sight, and lip outline is computed. The system employs a resolution strategy for dealing with mutual overlapping of the signer’s hands and face. Classification is based on hidden Markov models which are able to compensate time and amplitude variances in the articulation of a sign. The classification stage is designed for recognition of isolated signs, as well as of continuous sign language. In the latter case, a stochastic language model can be utilized, which considers uni- and bigram probabilities of single and successive signs. For statistical modeling of reference models each sign is represented either as a whole or as a composition of smaller subunits—similar to phonemes in spoken languages. While recognition based on word models is limited to rather small vocabularies, subunit models open the door to large vocabularies. Achieving signer-independence constitutes a challenging problem, as the articulation of a sign is subject to high interpersonal variance. This problem cannot be solved by simple feature normalization and must be addressed at the classification level. Therefore, dedicated adaptation methods known from speech recognition were implemented and modified to consider the specifics of sign languages. For rapid adaptation to unknown signers the proposed recognition system employs a combined approach of maximum likelihood linear regression and maximum a posteriori estimation.

[1]  Peter Vamplew Recognition of sign language gestures using neural networks , 1996 .

[2]  C Tomasi,et al.  Shape and motion from image streams: a factorization method. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[3]  Brendan McCane,et al.  Pose Estimation by Applied Numerical Techniques , 2002 .

[4]  W. Stokoe,et al.  Sign language structure: an outline of the visual communication systems of the American deaf. 1960. , 1961, Journal of deaf studies and deaf education.

[5]  Alex Bateman,et al.  An introduction to hidden Markov models. , 2007, Current protocols in bioinformatics.

[6]  Scott K. Liddell,et al.  American Sign Language: The Phonological Base , 2013 .

[7]  Biing-Hwang Juang,et al.  Hidden Markov Models for Speech Recognition , 1991 .

[8]  Timothy F. Cootes,et al.  Active Appearance Models , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Josef Kittler,et al.  A survey of the hough transform , 1988, Comput. Vis. Graph. Image Process..

[10]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[11]  Surendra Ranganath,et al.  Deciphering gestures with layered meanings and signer adaptation , 2004, Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004. Proceedings..

[12]  James M. Rehg,et al.  Statistical Color Models with Application to Skin Detection , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[13]  Konstantinos G. P. Derpanis,et al.  A Review of Vision-Based Hand Gestures , 2004 .

[14]  Misha Pavel,et al.  Towards ASR on partially corrupted speech , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[15]  Mark J. F. Gales,et al.  Mean and variance adaptation within the MLLR framework , 1996, Comput. Speech Lang..

[16]  Wen Gao,et al.  Signer-Independent Continuous Sign Language Recognition Based on SRN/HMM , 2001, Gesture Workshop.

[17]  Britta Bauer Erkennung kontinuierlicher Gebärdensprache mit Untereinheiten-Modellen , 2004 .

[18]  Biing-Hwang Juang,et al.  A study on speaker adaptation of the parameters of continuous density hidden Markov models , 1991, IEEE Trans. Signal Process..

[19]  Ayush S Parashar,et al.  Representation and Interpretation of Manual and Non-Manual Information for Automated American Sign Language Recognition , 2003 .

[20]  Alex Pentland,et al.  Real-Time American Sign Language Recognition Using Desk and Wearable Computer Based Video , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[21]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[22]  Thorsten Dick,et al.  Visual Hand Posture Recognition in Monocular Image Sequences , 2006, DAGM-Symposium.

[23]  Ipke Wachsmuth,et al.  Revised Papers from the International Gesture Workshop on Gesture and Sign Languages in Human-Computer Interaction , 2001 .

[24]  Narendra Ahuja,et al.  Extraction of 2D Motion Trajectories and Its Application to Hand Gesture Recognition , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[25]  Lalit R. Bahl,et al.  A Maximum Likelihood Approach to Continuous Speech Recognition , 1983, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Frederick Jelinek,et al.  Statistical methods for speech recognition , 1997 .

[27]  Karl-Friedrich Kraiss,et al.  Robust Person-Independent Visual Sign Language Recognition , 2005, IbPRIA.

[28]  Ulrich Canzler Nicht-intrusive Mimikanalyse , 2005 .

[29]  Daniel Schneider,et al.  Rapid Signer Adaptation for Isolated Sign Language Recognition , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[30]  Milan Sonka,et al.  Image Processing, Analysis and Machine Vision , 1993, Springer US.

[31]  Dimitris N. Metaxas,et al.  Parallel hidden Markov models for American sign language recognition , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[32]  Dimitris N. Metaxas,et al.  Toward Scalability in ASL Recognition: Breaking Down Signs into Phonemes , 1999, Gesture Workshop.

[33]  Robyn A. Owens,et al.  Visual Sign Language Recognition , 2000, Theoretical Foundations of Computer Vision.

[34]  Claudia Becker,et al.  Zur Struktur der Deutschen Gebärdensprache , 1997 .

[35]  Franck Luthon,et al.  Nonlinear color space and spatiotemporal MRF for hierarchical segmentation of face features in video , 2004, IEEE Transactions on Image Processing.

[36]  Kouichi Murakami,et al.  Gesture recognition using recurrent neural networks , 1991, CHI.

[37]  Yoshiaki Shirai,et al.  3-D hand posture recognition by training contour variation , 2004, Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004. Proceedings..

[38]  Ulrich Canzler,et al.  Extraction of Non Manual Features for Videobased Sign Language Recognition , 2002, MVA.

[39]  Karl-Friedrich Kraiss,et al.  Advanced Man-Machine Interaction , 2006 .

[40]  Jörg Zieren Visuelle Erkennung von Handposituren für einen interaktiven Gebärdensprachtutor , 2007 .

[41]  Surendra Ranganath,et al.  Automatic Sign Language Analysis: A Survey and the Future beyond Lexical Meaning , 2005, IEEE Trans. Pattern Anal. Mach. Intell..

[42]  Greg Welch,et al.  Welch & Bishop , An Introduction to the Kalman Filter 2 1 The Discrete Kalman Filter In 1960 , 1994 .