Combination of manual and non-manual features for sign language recognition based on conditional random field and active appearance model

Sign language recognition is the task of detection and recognition of manual signals (MSs) and non-manual signals (NMSs) in a signed utterance. In this paper, a novel method for recognizing MS and facial expressions as a NMS is proposed. This is achieved through a framework consisting of three components: (1) Candidate segments of MSs are discriminated using an hierarchical conditional random field (CRF) and Boost-Map embedding. It can distinguish signs, fingerspellings and non-sign patterns, and is robust to the various sizes, scales and rotations of the signer's hand. (2) Facial expressions as a NMS are recognized with support vector machine (SVM) and active appearance model (AAM), AAM is used to extract facial feature points. From these facial feature points, several measurements are computed to distinguish each facial component into defined facial expressions with SVM. (3) Finally, the recognition results of MSs and NMSs are fused in order to recognize signed sentences. Experiments demonstrate that the proposed method can successfully combine MSs and NMSs features for recognizing signed sentences from utterance data.

[1]  Alex Pentland,et al.  Real-Time American Sign Language Recognition Using Desk and Wearable Computer Based Video , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Stan Sclaroff,et al.  Automatic detection of relevant head gestures in American Sign Language communication , 2002, Object recognition supported by user interaction for service robots.

[3]  Surendra Ranganath,et al.  Representations for facial expressions , 2002, 7th International Conference on Control, Automation, Robotics and Vision, 2002. ICARCV 2002..

[4]  Seong-Whan Lee,et al.  Simultaneous spotting of signs and fingerspellings based on hierarchical conditional random fields and boostmap embeddings , 2010, Pattern Recognit..

[5]  George Kollios,et al.  BoostMap: An Embedding Method for Efficient Nearest Neighbor Retrieval , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Lale Akarun,et al.  A belief-based sequential fusion approach for fusing manual signs and non-manual signals , 2009, Pattern Recognit..

[7]  JEAN-MARC FELLOUS,et al.  PII: S0042-6989(97)00010-2 , 2003 .

[8]  R. Wilbur Effects of Varying Rate of Signing on ASL Manual Signs and Nonmanual Markers , 2009, Language and speech.

[9]  Timothy F. Cootes,et al.  Active Appearance Models , 1998, ECCV.

[10]  Ruiduo Yang,et al.  Detecting Coarticulation in Sign Language using Conditional Random Fields , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[11]  Surendra Ranganath,et al.  Automatic Sign Language Analysis: A Survey and the Future beyond Lexical Meaning , 2005, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  Stephan Liwicki,et al.  Automatic recognition of fingerspelled words in British Sign Language , 2009, 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[13]  Wen Gao,et al.  A Parallel Multistream Model for Integration of Sign Language Recognition and Lip Motion , 2000, ICMI.

[14]  Stan Sclaroff,et al.  Sign Language Spotting with a Threshold Model Based on Conditional Random Fields , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.