Non-manual grammatical marker recognition based on multi-scale, spatio-temporal analysis of head pose and facial expressions

Abstract Changes in eyebrow configuration, in conjunction with other facial expressions and head gestures, are used to signal essential grammatical information in signed languages. This paper proposes an automatic recognition system for non-manual grammatical markers in American Sign Language (ASL) based on a multi-scale, spatio-temporal analysis of head pose and facial expressions. The analysis takes account of gestural components of these markers, such as raised or lowered eyebrows and different types of periodic head movements. To advance the state of the art in non-manual grammatical marker recognition, we propose a novel multi-scale learning approach that exploits spatio-temporally low-level and high-level facial features. Low-level features are based on information about facial geometry and appearance, as well as head pose, and are obtained through accurate 3D deformable model-based face tracking. High-level features are based on the identification of gestural events, of varying duration, that constitute the components of linguistic non-manual markers. Specifically, we recognize events such as raised and lowered eyebrows, head nods, and head shakes. We also partition these events into temporal phases. We separate the anticipatory transitional movement (the onset ) from the linguistically significant portion of the event, and we further separate the core of the event from the transitional movement that occurs as the articulators return to the neutral position towards the end of the event (the offset ). This partitioning is essential for the temporally accurate localization of the grammatical markers, which could not be achieved at this level of precision with previous computer vision methods. In addition, we analyze and use the motion patterns of these non-manual events. Those patterns, together with the information about the type of event and its temporal phases, are defined as the high-level features. Using this multi-scale, spatio-temporal combination of low- and high-level features, we employ learning methods for accurate recognition of non-manual grammatical markers in ASL sentences.

[1]  Patricia Siple,et al.  Understanding language through sign language research , 1978 .

[2]  R. Forthofer,et al.  Rank Correlation Methods , 1981 .

[3]  Takeo Kanade,et al.  Subtly different facial expression recognition and expression intensity estimation , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[4]  Timothy F. Cootes,et al.  Active Appearance Models , 1998, ECCV.

[5]  Surendra Ranganath,et al.  Recognizing Continuous Grammatical Marker Facial Gestures in Sign Language Video , 2010, ACCV.

[6]  Y. Ariki,et al.  Recognition of Head Gestures Using Hidden Markov Models , 1996 .

[7]  Maja Pantic,et al.  Fully Automatic Recognition of the Temporal Phases of Facial Actions , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[8]  Thomas Burger,et al.  Sequential Belief-Based Fusion of Manual and Non-manual Information for Recognizing Isolated Signs , 2007, Gesture Workshop.

[9]  Geoffrey Restall Coulter,et al.  American sign language typology , 1979 .

[10]  Zhao Zheng,et al.  Head movement recognition based on LK algorithm and Gentleboost , 2011, The 7th International Conference on Networked Computing and Advanced Information Management.

[11]  Fei Yang,et al.  Recognizing eyebrow and periodic head gestures using CRFs for non-manual grammatical marker detection in ASL , 2013, 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[12]  Charlotte Lee Baker-Shenk,et al.  American Sign Language : A Teacher's Resource Text on Grammar and Culture , 1991 .

[13]  Fernando De la Torre,et al.  Unsupervised discovery of facial events , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[14]  Larry S. Davis,et al.  Recognition of head gestures using hidden Markov models , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[15]  Changsheng Li,et al.  Learning ordinal discriminative features for age estimation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  N. Michael A Framework for the Recognition of Non-Manual Markers in Segmented Sequences of American Sign Language , 2011 .

[17]  Shaogang Gong,et al.  Facial expression recognition based on Local Binary Patterns: A comprehensive study , 2009, Image Vis. Comput..

[18]  Alex Pentland,et al.  Real-Time American Sign Language Recognition Using Desk and Wearable Computer Based Video , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[19]  Fernando De la Torre,et al.  Facial Expression Analysis , 2011, Visual Analysis of Humans.

[20]  Nicu Sebe,et al.  Facial expression recognition from video sequences: temporal and static modeling , 2003, Comput. Vis. Image Underst..

[21]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[22]  Takeo Kanade,et al.  Facial Expression Analysis , 2011, AMFG.

[23]  Charlotte Baker-Shenk,et al.  A Microanalysis of the Nonmanual Components of Questions in American Sign Language , 1983 .

[24]  Scott K. Liddell American Sign Language Syntax , 1981 .

[25]  Katherine B. Martin,et al.  Facial Action Coding System , 2015 .

[26]  Jun Wang,et al.  3D Facial Expression Recognition Based on Primitive Surface Feature Distribution , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[27]  Qingshan Liu,et al.  A Framework for the Recognition of Nonmanual Markers in Segmented Sequences of American Sign Language , 2011, BMVC.

[28]  Mohan M. Trivedi,et al.  Head Pose Estimation in Computer Vision: A Survey , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Daniel Kelly,et al.  Automatic Recognition of Head Movement Gestures in Sign Language Sentences , 2009 .

[30]  Dimitris N. Metaxas,et al.  A review of motion analysis methods for human Nonverbal Communication Computing , 2013, Image Vis. Comput..

[31]  P. Ekman,et al.  Constants across cultures in the face and emotion. , 1971, Journal of personality and social psychology.

[32]  Vladimir Pavlovic,et al.  Multi-output Laplacian dynamic ordinal regression for facial expression recognition and intensity estimation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[33]  Dimitris N. Metaxas,et al.  Tracking Facial Features Using Mixture of Point Distribution Models , 2006, ICVGIP.

[34]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[35]  Surendra Ranganath,et al.  Tracking facial features under occlusions and recognizing facial expressions in sign language , 2008, 2008 8th IEEE International Conference on Automatic Face & Gesture Recognition.

[36]  Dimitris N. Metaxas,et al.  Spatial and temporal pyramids for grammatical expression recognition of American sign language , 2009, Assets '09.

[37]  Dimitris N. Metaxas,et al.  Parallel hidden Markov models for American sign language recognition , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[38]  Takeo Kanade,et al.  Recognizing Action Units for Facial Expression Analysis , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[39]  Qingshan Liu,et al.  Recognizing expressions from face and body gesture by temporal normalized motion and appearance features , 2011, CVPR 2011 WORKSHOPS.

[40]  Sudeep Sarkar,et al.  FUSION OF MANUAL AND NON-MANUAL INFORMATION IN AMERICAN SIGN LANGUAGE RECOGNITION , 2009 .

[41]  Stan Sclaroff,et al.  Sign Language Spotting with a Threshold Model Based on Conditional Random Fields , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42]  Thorsten Joachims,et al.  Training linear SVMs in linear time , 2006, KDD '06.

[43]  Beat Fasel,et al.  Automati Fa ial Expression Analysis: A Survey , 1999 .

[44]  P SumathiC.,et al.  Automatic Facial Expression Analysis A Survey , 2012 .

[45]  Carol Neidle,et al.  The Syntax of American Sign Language: Functional Categories and Hierarchical Structure , 1999 .

[46]  Dimitris N. Metaxas,et al.  Computer-based recognition of facial expressions in ASL : From face tracking to linguistic interpretation , 2010 .

[47]  Timothy F. Cootes,et al.  Feature Detection and Tracking with Constrained Local Models , 2006, BMVC.

[48]  Dimitris N. Metaxas,et al.  A Method for Recognition of Grammatically Significant Head Movements and Facial Expressions, Developed Through Use of a Linguistically Annotated Video Corpus 1 , 2009 .

[49]  Siome Goldenstein,et al.  Facial movement analysis in ASL , 2007, Universal Access in the Information Society.

[50]  Thomas Hofmann,et al.  Hidden Markov Support Vector Machines , 2003, ICML.

[51]  Fei Yang,et al.  Recognition of Nonmanual Markers in American Sign Language (ASL) Using Non-Parametric Adaptive 2D-3D Face Tracking , 2012, LREC.