Automatic recognition of facial expressions using hidden markov models and estimation of expression intensity

Facial expressions provide sensitive cues about emotional responses and play a major role in the study of psychological phenomena and the development of nonverbal communication. Facial expressions regulate social behavior, signal communicative intent, and are related to speech production. Most facial expression recognition systems focus on only six basic expressions. In everyday life, however, these six basic expressions occur relatively infrequently, and emotion or intent is more often communicated by subtle changes in one or two discrete features, such as tightening of the lips which may communicate anger. Humans are capable of producing thousands of expressions that vary in complexity, intensity, and meaning. The objective of this dissertation is to develop a computer vision system, including both facial feature extraction and recognition, that automatically discriminates among, subtly different facial expressions based on Facial Action Coding System (FACS) action units (AUs) using Hidden Markov Models (HMMs). Three methods are developed to extract facial expression information for automatic recognition. The first method is facial feature point tracking using the coarse-to-fine pyramid method, which can be sensitive to subtle feature motion and is capable to handle large displacements with sub-pixel accuracy. The second is dense flow tracking together with principal component analysis, where the entire facial motion information per frame is compressed to a low-dimensional weight vector for discrimination. And the third is high gradient component (i.e., furrow) analysis in the spatio-temporal domain, which exploits the transient variance associated with the facial expression. Upon extraction of the facial information, non-rigid facial expressions are separated from the rigid head motion components, and the face images are automatically aligned and normalized using an affine transformation. The resulting motion vector sequence is vector quantized to provide input to an HMM-based classifier, which addresses the time warping problem. A method is developed for determining the HMM topology optimal for our recognition system. The system also provides expression intensity estimation, which has significant effect on the actual meaning of the expression. We have studied more than 400 image sequences obtained from 90 subjects. The experimental results of our trained system showed an overall recognition accuracy of 87%, and also 87% in distinguishing among sets of three and six subtly different facial expressions for upper and lower facial regions, respectively.

[1]  T. Takagi,et al.  Recognition of facial expressions using conceptual fuzzy sets , 1993, [Proceedings 1993] Second IEEE International Conference on Fuzzy Systems.

[2]  P. Ekman,et al.  Approaches To Emotion , 1985 .

[3]  Takeo Kanade,et al.  Computer recognition of human faces , 1980 .

[4]  Takeo Kanade,et al.  A computer vision based method of facial expression analysis in parent-infant interaction , 1998 .

[5]  Thad Starner,et al.  Visual Recognition of American Sign Language Using Hidden Markov Models. , 1995 .

[6]  Irfan Essa,et al.  Analysis, interpretation and synthesis of facial expressions , 1995 .

[7]  Ashok Samal,et al.  Automatic recognition and analysis of human faces and facial expressions: a survey , 1992, Pattern Recognit..

[8]  William H. Press,et al.  Numerical recipes in C. The art of scientific computing , 1987 .

[9]  Andrew J. Viterbi,et al.  Error bounds for convolutional codes and an asymptotically optimum decoding algorithm , 1967, IEEE Trans. Inf. Theory.

[10]  P. Ekman,et al.  Facial Action Coding System: Manual , 1978 .

[11]  A. J. Fridlund Human Facial Expression: An Evolutionary View , 1994 .

[12]  David G. Stork,et al.  Speechreading: an overview of image processing, feature extraction, sensory integration and pattern recognition techniques , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[13]  Takeo Kanade,et al.  Automated facial expression recognition based on FACS action units , 1998, Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition.

[14]  Roberto Brunelli,et al.  Face Recognition: Features Versus Templates , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  Mei-Yuh Hwang,et al.  Speech recognition using hidden Markov models: A CMU perspective , 1990, Speech Commun..

[16]  David J. Fleet,et al.  Learning parameterized models of image motion , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[17]  H. Wallbott Effects of distortion of spatial and temporal resolution of video stimuli on emotion attributions , 1992 .

[18]  J. N. Bassili Emotion recognition: the role of facial movement and the relative importance of upper and lower areas of the face. , 1979, Journal of personality and social psychology.

[19]  L. Baum,et al.  A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains , 1970 .

[20]  C Tomasi,et al.  Shape and motion from image streams: a factorization method. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[21]  David Beymer,et al.  Vectorizing Face Images by Interleaving Shape and Texture Computations , 1995 .

[22]  T. Sato,et al.  Generation of facial expression using chaotic retrieval , 1994, ETFA '94. 1994 IEEE Symposium on Emerging Technologies and Factory Automation. (SEIKEN) Symposium) -Novel Disciplines for the Next Century- Proceedings.

[23]  P. Anandan,et al.  Hierarchical Model-Based Motion Estimation , 1992, ECCV.

[24]  Yochai Konig,et al.  "Eigenlips" for robust speech recognition , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[25]  A. O'Toole,et al.  Structural aspects of face recognition and the other-race effect , 1994, Memory & cognition.

[26]  V. Bruce,et al.  Face processing: Human perception and principal components analysis , 1996, Memory & cognition.

[27]  Junji Yamato,et al.  Recognizing human action in time-sequential images using hidden Markov model , 1992, Proceedings 1992 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[28]  Demetri Terzopoulos,et al.  A physical model of facial tissue and muscle articulation , 1990, [1990] Proceedings of the First Conference on Visualization in Biomedical Computing.

[29]  Jianzhong Wang,et al.  Adaptive multiresolution collocation methods for initial boundary value problems of nonlinear PDEs , 1996 .

[30]  Steve J. Young,et al.  HMM-based architecture for face identification , 1994, Image Vis. Comput..

[31]  Anca L. Ralescu,et al.  Some issues in fuzzy and linguistic modeling , 1995, Proceedings of 1995 IEEE International Conference on Fuzzy Systems..

[32]  W. Rinn,et al.  The neuropsychology of facial expression: a review of the neurological and psychological mechanisms for producing facial expressions. , 1984, Psychological bulletin.

[33]  Tomaso A. Poggio,et al.  Linear Object Classes and Image Synthesis From a Single Example Image , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[34]  Michael J. Black,et al.  A framework for the robust estimation of optical flow , 1993, 1993 (4th) International Conference on Computer Vision.

[35]  Takeo Kanade,et al.  Feature-point tracking by optical flow discriminates subtle differences in facial expression , 1998, Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition.

[36]  Hidefumi Kobatake,et al.  Extraction of facial sketch images and expression transformation based on FACS , 1995, Proceedings., International Conference on Image Processing.

[37]  Lawrence Sirovich,et al.  Application of the Karhunen-Loeve Procedure for the Characterization of Human Faces , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[38]  Tadao Nakamura,et al.  Neural network structures for expression recognition , 1993, Proceedings of 1993 International Conference on Neural Networks (IJCNN-93-Nagoya, Japan).

[39]  Timothy F. Cootes,et al.  Automatic Interpretation and Coding of Face Images Using Flexible Models , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[40]  Yoshiki Uchikawa,et al.  A face graph method using a fuzzy neural network for expressing conditions of complex systems , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.

[41]  Ching-Chung Li,et al.  Nonorthogonal wavelet edge detector with four filter-coefficients , 1993, Optics & Photonics.

[42]  Jie Yang Hidden markov model for human performance modeling , 1994 .

[43]  Fumio Hara,et al.  Recognition of mixed facial expressions by neural network , 1992, [1992] Proceedings IEEE International Workshop on Robot and Human Communication.

[44]  Robert M. Gray,et al.  An Algorithm for Vector Quantizer Design , 1980, IEEE Trans. Commun..

[45]  R. Gray,et al.  Vector quantization , 1984, IEEE ASSP Magazine.

[46]  Andrew Blake,et al.  Real-Time Lip Tracking for Audio-Visual Speech Recognition Applications , 1996, ECCV.

[47]  Larry S. Davis,et al.  Recognition of head gestures using hidden Markov models , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[48]  S. Kaiser,et al.  Automated coding of facial behavior in human-computer interactions with facs , 1992 .

[49]  Andrew Blake,et al.  Determining facial expressions in real time , 1995, Proceedings of IEEE International Conference on Computer Vision.

[50]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[51]  H. Harashima,et al.  Analysis and synthesis of facial expressions in knowledge-based coding of facial image sequences , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[52]  Atsushi Nakamura,et al.  Speech Recognition using Hidden Markov Models , 1998 .

[53]  David Beymer,et al.  Face recognition under varying pose , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[54]  Fumio Hara,et al.  The recognition of basic facial expressions by neural network , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.

[55]  Marian Stewart Bartlett,et al.  Classifying Facial Action , 1995, NIPS.

[56]  Berthold K. P. Horn,et al.  Determining Optical Flow , 1981, Other Conferences.

[57]  Takeo Kanade,et al.  Optical flow estimation using wavelet motion model , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[58]  M. Turk,et al.  Eigenfaces for Recognition , 1991, Journal of Cognitive Neuroscience.

[59]  Linda G. Shapiro,et al.  Computer and Robot Vision , 1991 .

[60]  Rama Chellappa,et al.  Human and machine recognition of faces: a survey , 1995, Proc. IEEE.

[61]  M. Yachida,et al.  Facial expression recognition and its degree estimation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[62]  Takeo Kanade,et al.  Automatically Recognizing Facial Expressions in the Spatio-Temporal Domain , 1999 .

[63]  Larry S. Davis,et al.  Recognizing Human FACIAL EXPRESSION , 1994 .

[64]  Pertti Roivainen,et al.  3-D Motion Estimation in Model-Based Facial Image Coding , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[65]  P. Ekman Facial expression and emotion. , 1993, The American psychologist.

[66]  Alex Pentland,et al.  Automatic lipreading by optical-flow analysis , 1989 .

[67]  M. Rosenblum,et al.  Human emotion recognition from motion using a radial basis function network architecture , 1994, Proceedings of 1994 IEEE Workshop on Motion of Non-rigid and Articulated Objects.

[68]  William H. Press,et al.  The Art of Scientific Computing Second Edition , 1998 .

[69]  David J. Kriegman,et al.  Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection , 1996, ECCV.

[70]  Raj Reddy,et al.  Large-vocabulary speaker-independent continuous speech recognition: the sphinx system , 1988 .

[71]  Yalin Xiong High precision image matching and shape recovery , 1996 .

[72]  Jeffrey F. Cohn,et al.  Effect of contingent changes in mothers' affective expression on the organization of behavior in 3-month-old infants , 1988 .

[73]  Kiyoharu Aizawa,et al.  Model-based analysis synthesis image coding (MBASIC) system for a person's face , 1989, Signal Process. Image Commun..

[74]  Ian Craw,et al.  Face Recognition by Computer , 1992, BMVC.

[75]  Terrence J. Sejnowski,et al.  SEXNET: A Neural Network Identifies Sex From Human Faces , 1990, NIPS.

[76]  D. McNeill So you think gestures are nonverbal , 1985 .

[77]  Charles K. Chui,et al.  An Introduction to Wavelets , 1992 .

[78]  Fumio Hara,et al.  Recognition of Mixed Facial Expressions by Neural Network. , 1993 .

[79]  Takeo Kanade,et al.  Subtly different facial expression recognition and expression intensity estimation , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[80]  Jr. G. Forney,et al.  The viterbi algorithm , 1973 .

[81]  Alice J. O'Toole,et al.  Connectionist models of face processing: A survey , 1994, Pattern Recognit..

[82]  Kenji Mase,et al.  Recognition of Facial Expression from Optical Flow , 1991 .

[83]  Chil-Woo Lee,et al.  Automatic recognition of human facial expressions , 1995, Proceedings of IEEE International Conference on Computer Vision.

[84]  B. D. Lucas Generalized image matching by the method of differences , 1985 .

[85]  C. Darwin,et al.  The Expression of the Emotions in Man and Animals , 1872 .

[86]  Garrison W. Cottrell,et al.  Representing Face Images for Emotion Classification , 1996, NIPS.

[87]  Michael Isard,et al.  Learning to Track the Visual Motion of Contours , 1995, Artif. Intell..

[88]  Timothy F. Cootes,et al.  Active Shape Models-Their Training and Application , 1995, Comput. Vis. Image Underst..

[89]  Demetri Terzopoulos,et al.  Analysis and Synthesis of Facial Image Sequences Using Physical and Anatomical Models , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[90]  John F. Canny,et al.  A Computational Approach to Edge Detection , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.