Automatic Analysis of Spontaneous Facial Behavior: A Final Project Report

The Facial Action Coding System (FACS) is the leading standard for measuring facial expressions in the behavioral sciences (Ekman & Friesen, 1978). FACS coding is presently performed manually by human experts, it is slow, and requires extensive training. Automating FACS coding could have revolutionary effects in our understanding of human facial expression and on the development of computer systems that understand facial expressions. Two teams, one at University of California San Diego and the Salk Institute, and another at University of Pittsburgh and Carnegie Mellon University, were challenged to develop prototype systems for automatic recognition of spontaneous facial expressions. Working with spontaneous expressions required solving technical and theoretical challenges which had not been previously addressed in the field. This document describes the system developed by the UCSD team. The approach employs 3-D pose estimation and warping techniques to reduce image variability due to general changes in pose. Machine learning techniques are then applied directly on the warped images or on biologically inspired representations of these images. No efforts are made to detect contours or other hand-crafted image features. This system employed general purpose learning mechanisms that can be applied to recognition of any action unit. The approach is parsimonious and does not require defining a different set of feature parameters or image operations for each facial action. The system was tested on a set of eyelid and eyebrow movements and successfully identified these movements in novel subjects. We showed that 3D tracking and warping followed by machine learning techniques directly applied to the warped images, is a viable and promising technology for automatic facial action recognition. One exciting aspect of the approach presented here is that information about movement dynamics emerged out of filters which were derived from the statistics of images. We believe all the pieces of the puzzle are ready for the development of automated systems that recognize spontaneous facial actions at the level of detail required by FACS. The main factor impeding development in this field is the lack of sufficiently large databases for training a greater variety of action units, and which may become a standard for comparison between different approaches. Based on our experience in this project we estimate that a database of 500 subjects, with 1 minute of rich facial behavior per subject, would be sufficient for dramatic improvements in the field.

[1]  Marian Stewart Bartlett,et al.  Image Representations for Facial Expression Coding , 1999, NIPS.

[2]  Kenji Mase,et al.  Recognition of Facial Expression from Optical Flow , 1991 .

[3]  Javier R. Movellan,et al.  Diffusion Networks, Products of Experts, and Factor Analysis , 2001 .

[4]  Norbert Krüger,et al.  Face Recognition by Elastic Bunch Graph Matching , 1997, CAIP.

[5]  Garrison W. Cottrell,et al.  EMPATH: Face, Emotion, and Gender Recognition Using Holons , 1990, NIPS.

[6]  G. Kitagawa Monte Carlo Filter and Smoother for Non-Gaussian Nonlinear State Space Models , 1996 .

[7]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[8]  Gregory D. Hager,et al.  Fast and Globally Convergent Pose Estimation from Video Images , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[10]  M. Turk,et al.  Eigenfaces for Recognition , 1991, Journal of Cognitive Neuroscience.

[11]  Marian Stewart Bartlett,et al.  A comparison of Gabor filter methods for automatic detection of facial landmarks , 2002, Proceedings of Fifth IEEE International Conference on Automatic Face Gesture Recognition.

[12]  Takeo Kanade,et al.  Detection, tracking, and classification of action units in facial expression , 2000, Robotics Auton. Syst..

[13]  Elie Bienenstock,et al.  Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.

[14]  Javier R. Movellan,et al.  Visual Speech Recognition with Stochastic Networks , 1994, NIPS.

[15]  David Salesin,et al.  Synthesizing realistic facial expressions from photographs , 1998, SIGGRAPH.

[16]  Marian Stewart Bartlett,et al.  Computer Recognition of Facial Actions: A study of co-articulation effects , 2001 .

[17]  Zhengyou Zhang,et al.  Comparison between geometry-based and Gabor-wavelets-based facial expression recognition using multi-layer perceptron , 1998, Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition.

[18]  Narendra Ahuja,et al.  A SNoW-Based Face Detector , 1999, NIPS.

[19]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[20]  Terrence J. Sejnowski,et al.  An Information-Maximization Approach to Blind Separation and Blind Deconvolution , 1995, Neural Computation.

[21]  J. Cohn,et al.  Automated face analysis by feature point tracking has high concurrent validity with manual FACS coding. , 1999, Psychophysiology.

[22]  Alex Pentland,et al.  Coding, Analysis, Interpretation, and Recognition of Facial Expressions , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[23]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[24]  Marian Stewart Bartlett,et al.  Classifying Facial Actions , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[25]  John Daugman,et al.  Neural networks for image transformation, analysis, and compression , 1988, Neural Networks.

[26]  Penio S. Penev,et al.  Local feature analysis: A general statistical theory for object representation , 1996 .

[27]  Timothy F. Cootes,et al.  Automatic Interpretation and Coding of Face Images Using Flexible Models , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[28]  Joachim M. Buhmann,et al.  Distortion Invariant Object Recognition in the Dynamic Link Architecture , 1993, IEEE Trans. Computers.

[29]  Harry Wechsler,et al.  The FERET database and evaluation procedure for face-recognition algorithms , 1998, Image Vis. Comput..

[30]  David J. Field,et al.  What Is the Goal of Sensory Coding? , 1994, Neural Computation.

[31]  P. Ekman,et al.  Smiles when lying. , 1988, Journal of personality and social psychology.

[32]  Marian Stewart Bartlett,et al.  Face image analysis by unsupervised learning , 2001 .

[33]  Shlomo Nir,et al.  NATO ASI Series , 1995 .

[34]  Larry S. Davis,et al.  Human expression recognition from motion using a radial basis function network architecture , 1996, IEEE Trans. Neural Networks.

[35]  Evan C. Smith A SNoW-Based Automatic Facial Feature Detector , .

[36]  Javier R. Movellan,et al.  Dynamic Features for Visual Speechreading: A Systematic Comparison , 1996, NIPS.

[37]  T. Sejnowski,et al.  Measuring facial expressions by computer image analysis. , 1999, Psychophysiology.

[38]  Takeo Kanade,et al.  Recognizing Action Units for Facial Expression Analysis , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[39]  Marian Stewart Bartlett,et al.  Classifying Facial Action , 1995, NIPS.

[40]  Vicki Bruce,et al.  Face Recognition: From Theory to Applications , 1999 .

[41]  Takeo Kanade,et al.  Comprehensive database for facial expression analysis , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[42]  Larry S. Davis,et al.  Recognizing Human Facial Expressions From Long Image Sequences Using Optical Flow , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[43]  Eero P. Simoncelli Statistical models for images: compression, restoration and synthesis , 1997, Conference Record of the Thirty-First Asilomar Conference on Signals, Systems and Computers (Cat. No.97CB36136).

[44]  Matthew Brand,et al.  Flexible flow for 3D nonrigid tracking and shape recovery , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[45]  J. Movellan,et al.  Are Your Eyes Smiling? Detecting genuine smiles with support vector machines and Gabor wavelets , 2001 .

[46]  P. Ekman,et al.  Facial action coding system: a technique for the measurement of facial movement , 1978 .

[47]  Garrison W. Cottrell,et al.  Representing Face Images for Emotion Classification , 1996, NIPS.

[48]  Ajit Singh,et al.  Optic flow computation : a unified perspective , 1991 .

[49]  Paul Mineiro,et al.  A Monte Carlo EM Approach for Partially Observable Diffusion Processes: Theory and Applications to Neural Networks , 2002, Neural Computation.

[50]  Terrence J. Sejnowski,et al.  The “independent components” of natural scenes are edge filters , 1997, Vision Research.

[51]  Demetri Terzopoulos,et al.  Analysis and Synthesis of Facial Image Sequences Using Physical and Anatomical Models , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[52]  Pertti Roivainen,et al.  3-D Motion Estimation in Model-Based Facial Image Coding , 1993, IEEE Trans. Pattern Anal. Mach. Intell..