Context-Sensitive Dynamic Ordinal Regression for Intensity Estimation of Facial Action Units

Modeling intensity of facial action units from spontaneously displayed facial expressions is challenging mainly because of high variability in subject-specific facial expressiveness, head-movements, illumination changes, etc. These factors make the target problem highly context-sensitive. However, existing methods usually ignore this context-sensitivity of the target problem. We propose a novel Conditional Ordinal Random Field (CORF) model for context-sensitive modeling of the facial action unit intensity, where the W5+ (who, when, what, where, why and how) definition of the context is used. While the proposed model is general enough to handle all six context questions, in this paper we focus on the context questions: who (the observed subject), how (the changes in facial expressions), and when (the timing of facial expressions and their intensity). The context questions who and howare modeled by means of the newly introduced context-dependent covariate effects, and the context question when is modeled in terms of temporal correlation between the ordinal outputs, i.e., intensity levels of action units. We also introduce a weighted softmax-margin learning of CRFs from data with skewed distribution of the intensity levels, which is commonly encountered in spontaneous facial data. The proposed model is evaluated on intensity estimation of pain and facial action units using two recently published datasets (UNBC Shoulder Pain and DISFA) of spontaneously displayed facial expressions. Our experiments show that the proposed model performs significantly better on the target tasks compared to the state-of-the-art approaches. Furthermore, compared to traditional learning of CRFs, we show that the proposed weighted learning results in more robust parameter estimation from the imbalanced intensity data.

[1]  Sridha Sridharan,et al.  Improved facial expression recognition via uni-hyperplane classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Shaun J. Canavan,et al.  BP4D-Spontaneous: a high-resolution spontaneous 3D dynamic facial expression database , 2014, Image Vis. Comput..

[3]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[4]  Thorsten Joachims,et al.  Cutting-plane training of structural SVMs , 2009, Machine Learning.

[5]  Qiang Ji,et al.  Automatic Eye Position Detection and Tracking Under Natural Facial Movement , 2008 .

[6]  Fernando De la Torre,et al.  Selective Transfer Machine for Personalized Facial Action Unit Detection , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Arman Savran,et al.  Regression-based intensity estimation of facial action units , 2012, Image Vis. Comput..

[8]  Fernando De la Torre,et al.  Dynamic cascades with bidirectional bootstrapping for spontaneous facial action unit detection , 2009, 2009 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops.

[9]  Jaime S. Cardoso,et al.  Measuring the Performance of Ordinal Classification , 2011, Int. J. Pattern Recognit. Artif. Intell..

[10]  Jiawei Han,et al.  Spectral Regression for Efficient Regularized Subspace Learning , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[11]  Minyoung Kim Large margin cost-sensitive learning of conditional random fields , 2010, Pattern Recognit..

[12]  Fei-Fei Li,et al.  Recognizing Human-Object Interactions in Still Images by Modeling the Mutual Context of Objects and Human Poses , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Takeo Kanade,et al.  The Extended Cohn-Kanade Dataset (CK+): A complete dataset for action unit and emotion-specified expression , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[14]  Maja Pantic,et al.  Machine analysis of facial behaviour: naturalistic and dynamic behaviour , 2009, Philosophical Transactions of the Royal Society B: Biological Sciences.

[15]  John McDonald,et al.  Investigating the Dynamics of Facial Expression , 2006, ISVC.

[16]  Andrew McCallum,et al.  Dynamic conditional random fields: factorized probabilistic models for labeling and segmenting sequence data , 2004, J. Mach. Learn. Res..

[17]  Vladimir Pavlovic,et al.  Kernel Conditional Ordinal Random Fields for Temporal Segmentation of Facial Action Units , 2012, ECCV Workshops.

[18]  Maja Pantic,et al.  Fully Automatic Recognition of the Temporal Phases of Facial Actions , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[19]  Maja Pantic,et al.  Continuous Pain Intensity Estimation from Facial Expressions , 2012, ISVC.

[20]  Arman Savran,et al.  Comparative evaluation of 3D vs. 2D modality for automatic detection of facial action units , 2012, Pattern Recognit..

[21]  Koby Crammer,et al.  On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines , 2002, J. Mach. Learn. Res..

[22]  A. Agresti Analysis of Ordinal Categorical Data , 1985 .

[23]  Wei Chu,et al.  Gaussian Processes for Ordinal Regression , 2005, J. Mach. Learn. Res..

[24]  Joel E. Pessa,et al.  Double or bifid zygomaticus major muscle: Anatomy, incidence, and clinical correlation , 1998, Clinical anatomy.

[25]  Daniel S. Messinger,et al.  A framework for automated measurement of the intensity of non-posed Facial Action Units , 2009, 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[26]  Thomas Hofmann,et al.  Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..

[27]  Judith A. Hall,et al.  The Deliberate Duchenne Smile: Individual Differences in Expressive Control , 2013 .

[28]  Shih-Fu Chang,et al.  Context-Based Concept Fusion with Boosted Conditional Random Fields , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[29]  K. Prkachin,et al.  The structure, reliability and validity of pain expression: Evidence from patients with shoulder pain , 2008, PAIN.

[30]  Cristian Sminchisescu,et al.  Conditional models for contextual human motion recognition , 2006, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[31]  Qiang Ji,et al.  Facial Action Unit Recognition by Exploiting Their Dynamic and Semantic Relationships , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Mohammad H. Mahoor,et al.  DISFA: A Spontaneous Facial Action Intensity Database , 2013, IEEE Transactions on Affective Computing.

[33]  Fernando De la Torre,et al.  Continuous AU intensity estimation using localized, sparse facial feature space , 2013, 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[34]  Simon Lucey,et al.  Investigating Spontaneous Facial Action Recognition through AAM Representations of the Face , 2007 .

[35]  Takafumi Kanamori,et al.  Statistical models and learning algorithms for ordinal regression problems , 2013, Inf. Fusion.

[36]  Wei Chu,et al.  New approaches to support vector ordinal regression , 2005, ICML.

[37]  Lawrence K. Saul,et al.  Large Margin Hidden Markov Models for Automatic Speech Recognition , 2006, NIPS.

[38]  Rainer Winkelmann,et al.  Analysis of Microdata , 2006 .

[39]  J. Fleiss,et al.  Intraclass correlations: uses in assessing rater reliability. , 1979, Psychological bulletin.

[40]  Fernando De la Torre,et al.  A Real-Time System for Head Tracking and Pose Estimation , 2010, ECCV Workshops.

[41]  P. McCullagh Regression Models for Ordinal Data , 1980 .

[42]  J. Cohn,et al.  Movement Differences between Deliberate and Spontaneous Facial Expressions: Zygomaticus Major Action in Smiling , 2006, Journal of nonverbal behavior.

[43]  Vladimir Pavlovic,et al.  Structured Output Ordinal Regression for Dynamic Facial Emotion Intensity Prediction , 2010, ECCV.

[44]  Jeffrey F. Cohn,et al.  Painful data: The UNBC-McMaster shoulder pain expression archive database , 2011, Face and Gesture 2011.

[45]  Alex Pentland,et al.  Human computing and machine understanding of human behavior: a survey , 2006, ICMI '06.

[46]  Chong-Wah Ngo,et al.  Semantic context modeling with maximal margin Conditional Random Fields for automatic image annotation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[47]  Gwen Littlewort,et al.  Fully Automatic Facial Action Recognition in Spontaneous Behavior , 2006, 7th International Conference on Automatic Face and Gesture Recognition (FGR06).

[48]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.