Context-Sensitive Conditional Ordinal Random Fields for Facial Action Intensity Estimation

We address the problem of modeling intensity levels of facial actions in video sequences. The intensity sequences often exhibit a large variability due to the context factors, such as the person-specific facial expressiveness or changes in illumination. Existing methods usually attempt to normalize this variability in data using different feature-selection and/or data pre-processing schemes. Consequently, they ignore the context in which the target facial actions occur. We propose a novel Conditional Random Field (CRF) based ordinal model for context-sensitive modeling of the facial action unit intensity, where the W5+ (Who, When, What, Where, Why and How) definition of the context is used. In particular, we focus on three contextual questions: Who (the observed person), How (the changes in facial expressions), and When (the timing of the facial expression intensity). The contextual questions Who and How are modeled by means of the newly introduced covariate effects, while the contextual question When is modeled in terms of temporal correlation between the intensity levels. We also introduce a weighted softmax-margin learning of CRFs from the data with a skewed distribution of the intensity levels, as commonly encountered in spontaneous facial data. The proposed model is evaluated for intensity estimation of facial action units and facial expressions of pain from the UNBC Shoulder Pain dataset. Our experimental results show the effectiveness of the proposed approach.

[1]  P. McCullagh Regression Models for Ordinal Data , 1980 .

[2]  K. Prkachin,et al.  The structure, reliability and validity of pain expression: Evidence from patients with shoulder pain , 2008, PAIN.

[3]  Shih-Fu Chang,et al.  Context-Based Concept Fusion with Boosted Conditional Random Fields , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[4]  Vladimir Pavlovic,et al.  Structured Output Ordinal Regression for Dynamic Facial Emotion Intensity Prediction , 2010, ECCV.

[5]  Fei-Fei Li,et al.  Recognizing Human-Object Interactions in Still Images by Modeling the Mutual Context of Objects and Human Poses , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  John McDonald,et al.  Investigating the Dynamics of Facial Expression , 2006, ISVC.

[7]  Jeffrey F. Cohn,et al.  Painful data: The UNBC-McMaster shoulder pain expression archive database , 2011, Face and Gesture 2011.

[8]  Maja Pantic,et al.  Continuous Pain Intensity Estimation from Facial Expressions , 2012, ISVC.

[9]  Jiawei Han,et al.  Spectral Regression for Efficient Regularized Subspace Learning , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[10]  Jaime S. Cardoso,et al.  Measuring the Performance of Ordinal Classification , 2011, Int. J. Pattern Recognit. Artif. Intell..

[11]  Arman Savran,et al.  Regression-based intensity estimation of facial action units , 2012, Image Vis. Comput..

[12]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[13]  Cristian Sminchisescu,et al.  Conditional Random Fields for Contextual Human Motion Recognition , 2005, ICCV.

[14]  Andrea Esuli,et al.  Evaluation Measures for Ordinal Regression , 2009, 2009 Ninth International Conference on Intelligent Systems Design and Applications.

[15]  Thorsten Joachims,et al.  Cutting-plane training of structural SVMs , 2009, Machine Learning.

[16]  Vladimir Pavlovic,et al.  Multi-output Laplacian dynamic ordinal regression for facial expression recognition and intensity estimation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  B. Welch The structure , 1992 .

[18]  R. Fildes Journal of the Royal Statistical Society (B): Gary K. Grunwald, Adrian E. Raftery and Peter Guttorp, 1993, “Time series of continuous proportions”, 55, 103–116.☆ , 1993 .

[19]  Thomas Hofmann,et al.  Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..

[20]  Fernando De la Torre,et al.  Continuous AU intensity estimation using localized, sparse facial feature space , 2013, 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[21]  Jeffrey F. Cohn,et al.  Automatic detection of pain intensity , 2012, ICMI '12.

[22]  Wei Chu,et al.  New approaches to support vector ordinal regression , 2005, ICML.

[23]  Lawrence K. Saul,et al.  Large Margin Hidden Markov Models for Automatic Speech Recognition , 2006, NIPS.

[24]  Rainer Winkelmann,et al.  Analysis of Microdata , 2006 .

[25]  Mohammad H. Mahoor,et al.  DISFA: A Spontaneous Facial Action Intensity Database , 2013, IEEE Transactions on Affective Computing.

[26]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[27]  Maja Pantic,et al.  Machine analysis of facial behaviour: naturalistic and dynamic behaviour , 2009, Philosophical Transactions of the Royal Society B: Biological Sciences.

[28]  Wei Chu,et al.  Gaussian Processes for Ordinal Regression , 2005, J. Mach. Learn. Res..

[29]  Alex Pentland,et al.  Human computing and machine understanding of human behavior: a survey , 2006, ICMI '06.

[30]  Chong-Wah Ngo,et al.  Semantic context modeling with maximal margin Conditional Random Fields for automatic image annotation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[31]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[32]  Daniel S. Messinger,et al.  A framework for automated measurement of the intensity of non-posed Facial Action Units , 2009, 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.