Variable-state Latent Conditional Random Field models for facial expression analysis

Automated recognition of facial expressions of emotions, and detection of facial action units (AUs) from videos depends critically on modeling of their dynamics. Some of these dynamics are characterized by changes in temporal phases (onset-apex-offset) and intensity of emotion expressions and AUs. The appearance of these changes may vary considerably among subjects, making the recognition/detection task very challenging. The state-of-the-art Latent Conditional Random Fields (L-CRF) framework allows us to efficiently encode these dynamics through the latent states accounting for the temporal consistency in emotion expression and ordinal relationships between its intensity levels. These latent states are typically assumed to be either unordered (nominal) or fully ordered (ordinal). Yet, while the video segments containing activation of the target AU may better be described using ordinal latent states (corresponding to the AU intensity levels), the segments where this AU does not occur, may better be described using unordered (nominal) latent states. To address this, we propose the variable-state L-CRF (VSL-CRF) model that automatically selects the optimal latent states for the target image sequence, based on the input data and underlying dynamics of the sequence. To reduce the model overfitting, we propose a novel graph-Laplacian regularization of the latent states. We evaluate the VSL-CRF on the tasks of facial expression recognition using the CK+ dataset, and AU detection using the GEMEP-FERA and DISFA datasets, and show that the proposed model achieves better generalization performance compared to traditional L-CRFs and other related state-of-the-art models.

[1]  Brian C. Lovell,et al.  Spatio-temporal covariance descriptors for action and gesture recognition , 2013, 2013 IEEE Workshop on Applications of Computer Vision (WACV).

[2]  Maja Pantic,et al.  Automatic Analysis of Facial Expressions: The State of the Art , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  J. Koenderink Q… , 2014, Les noms officiels des communes de Wallonie, de Bruxelles-Capitale et de la communaute germanophone.

[4]  P. McCullagh Regression Models for Ordinal Data , 1980 .

[5]  Vladimir Pavlovic,et al.  Context-Sensitive Dynamic Ordinal Regression for Intensity Estimation of Facial Action Units , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Daniel McDuff,et al.  Exploiting sparsity and co-occurrence structure for action unit recognition , 2015, 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[7]  Kuzman Ganchev,et al.  Cross-Lingual Discriminative Learning of Sequence Models with Posterior Regularization , 2013, EMNLP.

[8]  Emile A. Hendriks,et al.  Action unit classification using active appearance models and conditional random fields , 2011, Cognitive Processing.

[9]  Maja Pantic,et al.  Machine analysis of facial behaviour: naturalistic and dynamic behaviour , 2009, Philosophical Transactions of the Royal Society B: Biological Sciences.

[10]  Mohammad H. Mahoor,et al.  DISFA: A Spontaneous Facial Action Intensity Database , 2013, IEEE Transactions on Affective Computing.

[11]  Maja Pantic,et al.  Facial Action Detection Using Block-Based Pyramid Appearance Descriptors , 2012, 2012 International Conference on Privacy, Security, Risk and Trust and 2012 International Confernece on Social Computing.

[12]  Andrea Cavallaro,et al.  Automatic Analysis of Facial Affect: A Survey of Registration, Representation, and Recognition , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Zhen Li,et al.  Emotion recognition from an ensemble of features , 2011, Face and Gesture 2011.

[14]  Shaogang Gong,et al.  Conditional Mutual Infomation Based Boosting for Facial Expression Recognition , 2005, BMVC.

[15]  Sridha Sridharan,et al.  Improved facial expression recognition via uni-hyperplane classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Maja Pantic,et al.  A Dynamic Texture-Based Approach to Recognition of Facial Actions and Their Temporal Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Vladimir Pavlovic,et al.  Multi-output Laplacian dynamic ordinal regression for facial expression recognition and intensity estimation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Fernando De la Torre,et al.  Selective Transfer Machine for Personalized Facial Action Unit Detection , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Graham W. Taylor,et al.  TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE , 2018 .

[20]  Philip Wolfe,et al.  Validation of subgradient optimization , 1974, Math. Program..

[21]  Vladimir Pavlovic,et al.  Kernel Conditional Ordinal Random Fields for Temporal Segmentation of Facial Action Units , 2012, ECCV Workshops.

[22]  Maja Pantic,et al.  Fully Automatic Recognition of the Temporal Phases of Facial Actions , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[23]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[24]  Stefanos Zafeiriou,et al.  Incremental Face Alignment in the Wild , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Gwen Littlewort,et al.  Real Time Face Detection and Facial Expression Recognition: Development and Applications to Human Computer Interaction. , 2003, 2003 Conference on Computer Vision and Pattern Recognition Workshop.

[26]  Lionel Prevost,et al.  Facial Action Recognition Combining Heterogeneous Features via Multikernel Learning , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[27]  Trevor Darrell,et al.  Hidden Conditional Random Fields , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Shiguang Shan,et al.  Learning Expressionlets on Spatio-temporal Manifold for Dynamic Facial Expression Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Takeo Kanade,et al.  The Extended Cohn-Kanade Dataset (CK+): A complete dataset for action unit and emotion-specified expression , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[30]  Ning Chen,et al.  Bayesian inference with posterior regularization and applications to infinite latent SVMs , 2012, J. Mach. Learn. Res..

[31]  Qingshan Liu,et al.  Learning Multiscale Active Facial Patches for Expression Analysis , 2015, IEEE Transactions on Cybernetics.

[32]  Cristian Sminchisescu,et al.  Conditional models for contextual human motion recognition , 2006, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[33]  S. Lai,et al.  Learning partially-observed hidden conditional random fields for facial expression recognition , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  Trevor Darrell,et al.  Conditional Random Fields for Object Recognition , 2004, NIPS.

[35]  Jake K. Aggarwal,et al.  Facial expression recognition with temporal modeling of shapes , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[36]  Vladimir Pavlovic,et al.  Hidden Conditional Ordinal Random Fields for Sequence Classification , 2010, ECML/PKDD.

[37]  Lifeng Shang,et al.  Nonparametric discriminant HMM and application to facial expression recognition , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[38]  Fernando De la Torre,et al.  Facial Action Unit Event Detection by Cascade of Tasks , 2013, 2013 IEEE International Conference on Computer Vision.

[39]  Trevor Darrell,et al.  Hidden Conditional Random Fields for Gesture Recognition , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[40]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[41]  Qiang Ji,et al.  Capturing Complex Spatio-temporal Relations among Facial Muscles for Facial Expression Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[42]  Nicu Sebe,et al.  Authentic facial expression analysis , 2004, Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004. Proceedings..

[43]  Maja Pantic,et al.  Facial action recognition for facial expression analysis from static face images , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[44]  Mohammad H. Mahoor,et al.  On multi-task learning for facial action unit detection , 2013, 2013 28th International Conference on Image and Vision Computing New Zealand (IVCNZ 2013).

[45]  Vladimir Pavlovic,et al.  Structured Output Ordinal Regression for Dynamic Facial Emotion Intensity Prediction , 2010, ECCV.

[46]  Mohammad H. Mahoor,et al.  A lp-norm MTMKL framework for simultaneous detection of multiple facial action units , 2014, IEEE Winter Conference on Applications of Computer Vision.

[47]  Zhihong Zeng,et al.  A Survey of Affect Recognition Methods: Audio, Visual, and Spontaneous Expressions , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[48]  Ben Taskar,et al.  Posterior Regularization for Structured Latent Variable Models , 2010, J. Mach. Learn. Res..

[49]  Maja Pantic,et al.  The first facial expression recognition and analysis challenge , 2011, Face and Gesture 2011.

[50]  Maja Pantic,et al.  Facial Action Unit Detection using Probabilistic Actively Learned Support Vector Machines on Tracked Facial Point Data , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Workshops.

[52]  Maja Pantic,et al.  A Dynamic Appearance Descriptor Approach to Facial Actions Temporal Modeling , 2014, IEEE Transactions on Cybernetics.

[53]  Qiang Ji,et al.  Multiple-Facial Action Unit Recognition by Shared Feature Learning and Semantic Relation Modeling , 2014, 2014 22nd International Conference on Pattern Recognition.

[54]  Sridha Sridharan,et al.  Person-independent facial expression detection using Constrained Local Models , 2011, Face and Gesture 2011.