Computer-based recognition of facial expressions in ASL : From face tracking to linguistic interpretation

Most research in the field of sign language recognition has focused on the manual component of signing, despite the fact that there is critical grammatical information expressed through facial expressions and head gestures. We, therefore, propose a novel framework for robust tracking and analysis of nonmanual behaviors, with an application to sign language recognition. Our method uses computer vision techniques to track facial expressions and head movements from video, in order to recognize such linguistically significant expressions. The methods described here have relied crucially on the use of a linguistically annotated video corpus that is being developed, as the annotated video examples have served for training and testing our models. We apply our framework to continuous recognition of three classes of grammatical expressions, namely wh-questions, negative expressions, and topics. Our method is signer-independent, utilizing spatial pyramids and Hidden Markov Models (HMMs) to model the temporal variations of facial shape and appearance.

[1]  Carol Neidle,et al.  SignStream™: A database tool for research on visual-gestural language , 2002 .

[2]  Charlotte Baker-Shenk,et al.  A Microanalysis of the Nonmanual Components of Questions in American Sign Language , 1983 .

[3]  Dimitris N. Metaxas,et al.  A Method for Recognition of Grammatically Significant Head Movements and Facial Expressions, Developed Through Use of a Linguistically Annotated Video Corpus 1 , 2009 .

[4]  Andrew Zisserman,et al.  Representing shape with a spatial pyramid kernel , 2007, CIVR '07.

[5]  C Neidle,et al.  SignStream: A tool for linguistic and computer vision research on visual-gestural language data , 2001, Behavior research methods, instruments, & computers : a journal of the Psychonomic Society, Inc.

[6]  Carol Neidle,et al.  The Syntax of American Sign Language: Functional Categories and Hierarchical Structure , 1999 .

[7]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[8]  Vladimir Pavlovic,et al.  Visual Interpretation of Hand Gestures for Human-Computer Interaction: A Review , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[10]  Geoffrey Restall Coulter,et al.  American sign language typology , 1979 .

[11]  Siome Goldenstein,et al.  Toward computational understanding of sign language , 2008 .

[12]  Karl-Friedrich Kraiss,et al.  Video-based sign recognition using self-organizing subunits , 2002, Object recognition supported by user interaction for service robots.

[13]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[14]  Timothy F. Cootes,et al.  Active Shape Models-Their Training and Application , 1995, Comput. Vis. Image Underst..

[15]  Siome Goldenstein,et al.  Facial movement analysis in ASL , 2007, Universal Access in the Information Society.

[16]  Trevor Darrell,et al.  The pyramid match kernel: discriminative classification with sets of image features , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[17]  Dimitris N. Metaxas,et al.  ASL recognition based on a coupling between HMMs and 3D motion analysis , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[18]  Karl-Friedrich Kraiss,et al.  Recent developments in visual sign language recognition , 2008, Universal Access in the Information Society.

[19]  Michael J. Swain,et al.  Color indexing , 1991, International Journal of Computer Vision.

[20]  Surendra Ranganath,et al.  Automatic Sign Language Analysis: A Survey and the Future beyond Lexical Meaning , 2005, IEEE Trans. Pattern Anal. Mach. Intell..

[21]  Dimitris N. Metaxas,et al.  Tracking Facial Features Using Mixture of Point Distribution Models , 2006, ICVGIP.

[22]  Dimitris N. Metaxas,et al.  Handshapes and movements: Multiple-channel ASL recognition , 2004 .

[23]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[24]  Scott K. Liddell American Sign Language Syntax , 1981 .

[25]  Dimitris N. Metaxas,et al.  Spatial and temporal pyramids for grammatical expression recognition of American sign language , 2009, Assets '09.