Sign Language Spotting with a Threshold Model Based on Conditional Random Fields

Sign language spotting is the task of detecting and recognizing signs in a signed utterance, in a set vocabulary. The difficulty of sign language spotting is that instances of signs vary in both motion and appearance. Moreover, signs appear within a continuous gesture stream, interspersed with transitional movements between signs in a vocabulary and nonsign patterns (which include out-of-vocabulary signs, epentheses, and other movements that do not correspond to signs). In this paper, a novel method for designing threshold models in a conditional random field (CRF) model is proposed which performs an adaptive threshold for distinguishing between signs in a vocabulary and nonsign patterns. A short-sign detector, a hand appearance-based sign verification method, and a subsign reasoning method are included to further improve sign language spotting accuracy. Experiments demonstrate that our system can spot signs from continuous data with an 87.0 percent spotting rate and can recognize signs from isolated data with a 93.5 percent recognition rate versus 73.5 percent and 85.4 percent, respectively, for CRFs without a threshold model, short-sign detection, subsign reasoning, and hand appearance-based sign verification. Our system can also achieve a 15.0 percent sign error rate (SER) from continuous data and a 6.4 percent SER from isolated data versus 76.2 percent and 14.5 percent, respectively, for conventional CRFs.

[1]  R. Cole,et al.  Survey of the State of the Art in Human Language Technology , 2010 .

[2]  Seong-Whan Lee,et al.  Gesture Spotting and Recognition for Human–Robot Interaction , 2007, IEEE Transactions on Robotics.

[3]  Cristian Sminchisescu,et al.  Conditional models for contextual human motion recognition , 2006, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[4]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[5]  Seong-Whan Lee,et al.  Face Detection and Facial Component Extraction by Wavelet Decomposition and Support Vector Machines , 2003, AVBPA.

[6]  Shan Lu,et al.  Recognition of local features for camera-based sign language recognition system , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[7]  Stan Sclaroff,et al.  Simultaneous Localization and Recognition of Dynamic Hand Gestures , 2005, 2005 Seventh IEEE Workshops on Applications of Computer Vision (WACV/MOTION'05) - Volume 1.

[8]  Dimitris N. Metaxas,et al.  A Framework for Recognizing the Simultaneous Aspects of American Sign Language , 2001, Comput. Vis. Image Underst..

[9]  Robyn A. Owens,et al.  Australian sign language recognition , 2005, Machine Vision and Applications.

[10]  Thomas G. Dietterich Machine Learning for Sequential Data: A Review , 2002, SSPR/SPR.

[11]  Martin Szummer Learning diagram parts with hidden random fields , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[12]  Surendra Ranganath,et al.  Automatic Sign Language Analysis: A Survey and the Future beyond Lexical Meaning , 2005, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  M. A. Bush,et al.  Training and search algorithms for an interactive wordspotting system , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[14]  Karl-Friedrich Kraiss,et al.  Video-based sign recognition using self-organizing subunits , 2002, Object recognition supported by user interaction for service robots.

[15]  Narendra Ahuja,et al.  Extraction of 2D Motion Trajectories and Its Application to Hand Gesture Recognition , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[16]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[17]  Sudeep Sarkar,et al.  Unsupervised Modeling of Signs Embedded in Continuous Sentences , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Workshops.

[18]  Stan Sclaroff,et al.  Spatiotemporal gesture segmentation , 2006 .

[19]  Alex Pentland,et al.  Real-Time American Sign Language Recognition Using Desk and Wearable Computer Based Video , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[20]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[21]  Jin-Hyung Kim,et al.  An HMM-Based Threshold Model Approach for Gesture Recognition , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[22]  W. Stokoe Sign language structure: an outline of the visual communication systems of the American deaf. 1960. , 1961, Journal of deaf studies and deaf education.

[23]  Trevor Darrell,et al.  Hidden Conditional Random Fields , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Trevor Darrell,et al.  Latent-Dynamic Discriminative Models for Continuous Gesture Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Alex Acero,et al.  Hidden conditional random fields for phone classification , 2005, INTERSPEECH.

[26]  Ali Farhadi,et al.  Aligning ASL for Statistical Translation Using a Discriminative Word Model , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[27]  Stan Sclaroff,et al.  Accurate and Efficient Gesture Spotting via Pruning and Subgesture Reasoning , 2005, ICCV-HCI.

[28]  Sang-Woong Lee,et al.  Multiple Human Detection and Tracking Based on Weighted Temporal Texture Features , 2006, Int. J. Pattern Recognit. Artif. Intell..

[29]  Ruiduo Yang,et al.  Detecting Coarticulation in Sign Language using Conditional Random Fields , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[30]  Wen Gao,et al.  Transition movement models for large vocabulary continuous sign language recognition , 2004, Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004. Proceedings..

[31]  R. Battison,et al.  Lexical Borrowing in American Sign Language , 1978 .

[32]  Andrew McCallum,et al.  Maximum Entropy Markov Models for Information Extraction and Segmentation , 2000, ICML.

[33]  Frederick Jelinek,et al.  Statistical methods for speech recognition , 1997 .

[34]  Ali Farhadi,et al.  Transfer Learning in Sign language , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Richard Rose,et al.  Discriminant wordspotting techniques for rejecting non-vocabulary utterances in unconstrained speech , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[36]  W. Stokoe,et al.  Sign language structure: an outline of the visual communication systems of the American deaf. 1960. , 1961, Journal of deaf studies and deaf education.

[37]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[38]  David Windridge,et al.  A Linguistic Feature Vector for the Visual Interpretation of Sign Language , 2004, ECCV.

[39]  Annelies Braffort ARGo: An Architecture for Sign Language Recognition and Interpretation , 1996, Gesture Workshop.

[40]  Ruiduo Yang,et al.  Enhanced Level Building Algorithm for the Movement Epenthesis Problem in Sign Language Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[41]  Hanna M. Wallach,et al.  Conditional Random Fields: An Introduction , 2004 .