Fingerspelling Recognition with Semi-Markov Conditional Random Fields

Recognition of gesture sequences is in general a very difficult problem, but in certain domains the difficulty may be mitigated by exploiting the domain's ``grammar''. One such grammatically constrained gesture sequence domain is sign language. In this paper we investigate the case of finger spelling recognition, which can be very challenging due to the quick, small motions of the fingers. Most prior work on this task has assumed a closed vocabulary of finger spelled words, here we study the more natural open-vocabulary case, where the only domain knowledge is the possible finger spelled letters and statistics of their sequences. We develop a semi-Markov conditional model approach, where feature functions are defined over segments of video and their corresponding letter labels. We use classifiers of letters and linguistic hand shape features, along with expected motion profiles, to define segmental feature functions. This approach improves letter error rate (Levenshtein distance between hypothesized and correct letter sequences) from 16.3% using a hidden Markov model baseline to 11.6% using the proposed semi-Markov model.

[1]  Li Wang,et al.  Human Action Segmentation and Recognition Using Discriminative Semi-Markov Models , 2011, International Journal of Computer Vision.

[2]  Dimitris N. Metaxas,et al.  Handshapes and movements: Multiple-channel ASL recognition , 2004 .

[3]  Alex Pentland,et al.  Real-Time American Sign Language Recognition Using Desk and Wearable Computer Based Video , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Carlo Tomasi,et al.  Fingerspelling Recognition through Classification of Letter-to-Letter Transitions , 2009, ACCV.

[5]  Eun-Jung Holden,et al.  Dynamic Fingerspelling Recognition using Geometric and Motion Features , 2006, 2006 International Conference on Image Processing.

[6]  Dimitris N. Metaxas,et al.  Parallel hidden Markov models for American sign language recognition , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[7]  Dimitris N. Metaxas,et al.  Toward Scalability in ASL Recognition: Breaking Down Signs into Phonemes , 1999, Gesture Workshop.

[8]  Jovan Popović,et al.  Real-time hand-tracking with a color glove , 2009, SIGGRAPH 2009.

[9]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[10]  Hermann Ney,et al.  Speech recognition techniques for a sign language recognition system , 2007, INTERSPEECH.

[11]  Geoffrey Zweig,et al.  A segmental CRF approach to large vocabulary continuous speech recognition , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.

[12]  Diane Brentari,et al.  FOREIGN VOCABULARY IN SIGN LANGUAGES: A CROSS-LINGUISTIC INVESTIGATION OF WORD FORMATION. Diane Brentari (Ed.). Mahwah, NJ: Erlbaum, 2001. Pp. xx + 186. $49.95 cloth. , 2002, Studies in Second Language Acquisition.

[13]  Liya Ding,et al.  Modelling and recognition of the linguistic components in American Sign Language , 2009, Image Vis. Comput..

[14]  Dimitris N. Metaxas,et al.  A Framework for Recognizing the Simultaneous Aspects of American Sign Language , 2001, Comput. Vis. Image Underst..

[15]  Diane Brentari,et al.  A Prosodic Model of Sign Language Phonology , 1999 .

[16]  Andreas Stolcke,et al.  SRILM at Sixteen: Update and Outlook , 2011 .

[17]  Ming C. Leu,et al.  Recognition of Finger Spelling of American Sign Language with Artificial Neural Network Using Position/Orientation Sensors and Data Glove , 2005, ISNN.

[18]  Nicolas Pugeault,et al.  Spelling it out: Real-time ASL fingerspelling recognition , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[19]  William W. Cohen,et al.  Semi-Markov Conditional Random Fields for Information Extraction , 2004, NIPS.

[20]  Seong-Whan Lee,et al.  Sign language spotting based on semi-Markov Conditional Random Field , 2009, 2009 Workshop on Applications of Computer Vision (WACV).

[21]  Ramesh Raskar,et al.  Exploiting Depth Discontinuities for Vision-Based Fingerspelling Recognition , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[22]  David Windridge,et al.  A Linguistic Feature Vector for the Visual Interpretation of Sign Language , 2004, ECCV.

[23]  Kirsti Grobel,et al.  Isolated sign language recognition using hidden Markov models , 1996, 1997 IEEE International Conference on Systems, Man, and Cybernetics. Computational Cybernetics and Simulation.

[24]  Carol Padden,et al.  How the Alphabet Came to Be Used in a Sign Language , 2003 .

[25]  Svetha Venkatesh,et al.  Activity recognition and abnormality detection with the switching hidden semi-Markov model , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[26]  Mohammed Waleed Kadous,et al.  Machine Recognition of Auslan Signs Using PowerGloves: Towards Large-Lexicon Recognition of Sign Lan , 1996 .

[27]  Petros Maragos,et al.  Model-level data-driven sub-units for signs in videos of continuous Sign Language , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[28]  Stephan Liwicki,et al.  Automatic recognition of fingerspelled words in British Sign Language , 2009, 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[29]  Petros Maragos,et al.  Advances in phonetics-based sub-unit modeling for transcription alignment and sign language recognition , 2011, CVPR 2011 WORKSHOPS.

[30]  Ruiduo Yang,et al.  Detecting Coarticulation in Sign Language using Conditional Random Fields , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[31]  Petros Maragos,et al.  Affine-invariant modeling of shape-appearance images applied on sign language handshape classification , 2010, 2010 IEEE International Conference on Image Processing.

[32]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[33]  George Kollios,et al.  BoostMap: A method for efficient approximate similarity rankings , 2004, CVPR 2004.

[34]  Gregory Shakhnarovich,et al.  American sign language fingerspelling recognition with phonological feature-based tandem models , 2012, 2012 IEEE Spoken Language Technology Workshop (SLT).

[35]  Kirsti Grobel,et al.  Video-based Recognition of Fingerspelling in Real-Time , 1996, Bildverarbeitung für die Medizin.

[36]  Stan Sclaroff,et al.  Exploiting phonological constraints for handshape inference in ASL video , 2011, CVPR 2011.