Dynamics of Facial Expression Extracted Automatically from Video

We present a systematic comparison of machine learning methods applied to the problem of fully automatic recognition of facial expressions, including AdaBoost, support vector machines, and linear discriminant analysis. Each video-frame is first scanned in real-time to detect approximately upright-frontal faces. The faces found are scaled into image patches of equal size, convolved with a bank of Gabor energy filters, and then passed to a recognition engine that codes facial expressions into 7 dimensions in real time: neutral, anger, disgust, fear, joy, sadness, surprise. We report results on a series of experiments comparing spatial frequency ranges, feature selection techniques, and recognition engines. Best results were obtained by selecting a subset of Gabor filters using AdaBoost and then training Support Vector Machines on the outputs of the filters selected by AdaBoost. The generalization performance to new subjects for a 7-way forced choice was 93% or more correct on two publicly available datasets, the best performance reported so far on these datasets. Surprisingly, registration of internal facial features was not necessary, even though the face detector does not provide precisely registered images. The outputs of the classifier change smoothly as a function of time and thus can be used for unobtrusive motion capture. We developed an end-to-end system that provides facial expression codes at 24 frames per second and animates a computer generated character. In real-time this expression mirror operates down to resolutions of 16 pixels from eye to eye. We also applied the system to fully automated facial action coding.

[1]  Maja Pantic,et al.  Automatic Analysis of Facial Expressions: The State of the Art , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Marian Stewart Bartlett,et al.  Automatic Analysis of Spontaneous Facial Behavior: A Final Project Report , 2001 .

[3]  Nello Cristianini,et al.  Large Margin DAGs for Multiclass Classification , 1999, NIPS.

[4]  Robert E. Schapire,et al.  A Brief Introduction to Boosting , 1999, IJCAI.

[5]  Yoav Freund,et al.  A Short Introduction to Boosting , 1999 .

[6]  Ronald A. Cole,et al.  CU animate tools for enabling conversations with animated characters , 2002, INTERSPEECH.

[7]  Marian Stewart Bartlett,et al.  Face image analysis by unsupervised learning , 2001 .

[8]  Horst-Michael Groß,et al.  Statistical and neural methods for vision-based analysis of facial expressions and gender , 2004, 2004 IEEE International Conference on Systems, Man and Cybernetics (IEEE Cat. No.04CH37583).

[9]  Yoram Singer,et al.  Multiclass Learning by Probabilistic Embeddings , 2002, NIPS.

[10]  Joachim M. Buhmann,et al.  Distortion Invariant Object Recognition in the Dynamic Link Architecture , 1993, IEEE Trans. Computers.

[11]  Narendra Ahuja,et al.  Learning to Recognize Three-Dimensional Objects , 2002, Neural Computation.

[12]  Marian Stewart Bartlett,et al.  Classifying Facial Actions , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  Takeo Kanade,et al.  Comprehensive database for facial expression analysis , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[14]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[15]  P. Ekman,et al.  Facial action coding system: a technique for the measurement of facial movement , 1978 .

[16]  Michael J. Lyons,et al.  Classifying facial attributes using a 2-D Gabor wavelet representation and discriminant analysis , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[17]  P. Ekman Pictures of Facial Affect , 1976 .

[18]  Javier R. Movellan,et al.  3D Tracking of Morphable Objects Using Conditionally Gaussian Nonlinear Filters , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[19]  Nicu Sebe,et al.  Learning Bayesian network classifiers for facial expression recognition both labeled and unlabeled data , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[20]  G. Cottrell,et al.  EMPATH: A Neural Network that Categorizes Facial Expressions , 2002, Journal of Cognitive Neuroscience.

[21]  Takeo Kanade,et al.  Recognizing Action Units for Facial Expression Analysis , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[22]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[23]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[24]  Gwen Littlewort,et al.  Towards Social Robots: Automatic Evaluation of Human-robot Interaction by Face Detection and Expression Classification , 2003, NIPS.

[25]  Ulrich H.-G. Kreßel,et al.  Pairwise classification and support vector machines , 1999 .

[26]  Dan Roth,et al.  Constraint Classification for Multiclass Classification and Ranking , 2002, NIPS.

[27]  Yuan Qi,et al.  Fully automatic upper facial action recognition , 2003, 2003 IEEE International SOI Conference. Proceedings (Cat. No.03CH37443).