Copula Ordinal Regression Framework for Joint Estimation of Facial Action Unit Intensity

Joint modeling of the intensity of multiple facial action units (AUs) from face images is challenging due to the large number of AUs (30+) and their intensity levels (6). This is in part due to the lack of suitable models that can efficiently handle such a large number of outputs/classes simultaneously, but also due to the lack of suitable data the models on. For this reason, majority of the methods resort to independent classifiers for the AU intensity. This is suboptimal for at least two reasons: the facial appearance of some AUs changes depending on the intensity of other AUs, and some AUs co-occur more often than others. To this end, we propose the Copula regression approach for modeling multivariate ordinal variables. Our model accounts for $ordinal$ordinal structure in output variables and their $non$non-$linear$linear dependencies via copula functions modeled as cliques of a conditional random fields. The copula ordinal regression model achieves the joint learning and inference of intensities of multiple AUs, while being computationally tractable. We demonstrate the effectiveness of our approach on three challenging datasets of naturalistic facial expressions and we show that the estimation of target AU intensities improves especially in the case of (a) noisy image features, (b) head-pose variations and (c) imbalanced training data. Lastly, we show that the proposed approach consistently outperforms (i) independent modeling of AU intensities and (ii) the state-of-the-art approach for the target task and (iii) deep convolutional neural networks.

[1]  T. Louis,et al.  Inferences on the association parameter in copula models for bivariate survival data. , 1995, Biometrics.

[2]  Maja Pantic,et al.  A Dynamic Appearance Descriptor Approach to Facial Actions Temporal Modeling , 2014, IEEE Transactions on Cybernetics.

[3]  Qiang Ji,et al.  Multiple-Facial Action Unit Recognition by Shared Feature Learning and Semantic Relation Modeling , 2014, 2014 22nd International Conference on Pattern Recognition.

[4]  Mohamed Chetouani,et al.  Facial Action Unit intensity prediction via Hard Multi-Task Metric Learning for Kernel Regression , 2015, 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[5]  Qiang Ji,et al.  Capturing Global Semantic Relationships for Facial Action Unit Recognition , 2013, 2013 IEEE International Conference on Computer Vision.

[6]  Stefanos Zafeiriou,et al.  Markov Random Field Structures for Facial Action Unit Intensity Estimation , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[7]  Xiaogang Wang,et al.  Deep Learning Face Representation from Predicting 10,000 Classes , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[9]  Frank D. Wood,et al.  Characterizing neural dependencies with copula models , 2008, NIPS.

[10]  ZhouZhi-Hua,et al.  Multilabel Neural Networks with Applications to Functional Genomics and Text Categorization , 2006 .

[11]  Maja Pantic,et al.  Meta-Analysis of the First Facial Expression Recognition Challenge , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[12]  H. Friedl Econometric Analysis of Count Data , 2002 .

[13]  Maja Pantic,et al.  Doubly Sparse Relevance Vector Machine for Continuous Facial Behavior Estimation , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Fernando De la Torre,et al.  Continuous AU intensity estimation using localized, sparse facial feature space , 2013, 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[15]  Maja Pantic,et al.  Multi-conditional Latent Variable Model for Joint Facial Action Unit Detection , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[16]  Andrew Zisserman,et al.  Deep Face Recognition , 2015, BMVC.

[17]  Arman Savran,et al.  Regression-based intensity estimation of facial action units , 2012, Image Vis. Comput..

[18]  Honggang Zhang,et al.  Deep Region and Multi-label Learning for Facial Action Unit Detection , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Maja Pantic,et al.  Parametric temporal alignment for the detection of facial action temporal segments , 2014, BMVC.

[20]  P. Ekman,et al.  Facial action coding system , 2019 .

[21]  J. Fleiss,et al.  Intraclass correlations: uses in assessing rater reliability. , 1979, Psychological bulletin.

[22]  Shiguang Shan,et al.  AU-aware Deep Networks for facial expression recognition , 2013, 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[23]  Dumitru Erhan,et al.  Deep Neural Networks for Object Detection , 2013, NIPS.

[24]  Grigorios Tsoumakas,et al.  Multi-Label Classification: An Overview , 2007, Int. J. Data Warehous. Min..

[25]  Guosheng Lin,et al.  Efficient Piecewise Training of Deep Structured Models for Semantic Segmentation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Vladimir Pavlovic,et al.  Context-Sensitive Dynamic Ordinal Regression for Intensity Estimation of Facial Action Units , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Daniel McDuff,et al.  Exploiting sparsity and co-occurrence structure for action unit recognition , 2015, 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[28]  C. Genest Frank's family of bivariate distributions , 1987 .

[29]  Trevor J. Hastie,et al.  Exact Covariance Thresholding into Connected Components for Large-Scale Graphical Lasso , 2011, J. Mach. Learn. Res..

[30]  R. Tibshirani,et al.  Sparse inverse covariance estimation with the graphical lasso. , 2008, Biostatistics.

[31]  A. Agresti,et al.  Analysis of Ordinal Categorical Data. , 1985 .

[32]  Di Huang,et al.  Local Binary Patterns and Its Application to Facial Image Analysis: A Survey , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[33]  Qiang Ji,et al.  A Unified Probabilistic Framework for Spontaneous Facial Action Modeling and Understanding , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Qiang Ji,et al.  A unified probabilistic framework for measuring the intensity of spontaneous facial action units , 2013, 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[35]  C. Genest,et al.  A Primer on Copulas for Count Data , 2007, ASTIN Bulletin.

[36]  Andrea Cavallaro,et al.  Automatic Analysis of Facial Affect: A Survey of Registration, Representation, and Recognition , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  Joost van de Weijer,et al.  From Emotions to Action Units with Hidden and Semi-Hidden-Task Learning , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[38]  Lijun Yin,et al.  FERA 2015 - second Facial Expression Recognition and Analysis challenge , 2015, 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[39]  Thomas S. Huang,et al.  Do Deep Neural Networks Learn Facial Action Units When Doing Expression Recognition? , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[40]  Qingshan Liu,et al.  Learning active facial patches for expression analysis , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[41]  Zuheng Ming,et al.  Facial Action Units intensity estimation by the fusion of features with multi-kernel Support Vector Machine , 2015, 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[42]  Jean-Philippe Thiran,et al.  Discriminant multi-label manifold embedding for facial Action Unit detection , 2015, 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[43]  J. Cohn,et al.  All Smiles are Not Created Equal: Morphology and Timing of Smiles Perceived as Amused, Polite, and Embarrassed/Nervous , 2009, Journal of nonverbal behavior.

[44]  Maja Pantic,et al.  Latent trees for estimating intensity of Facial Action Units , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Qiang Ji,et al.  Facial Action Unit Recognition by Exploiting Their Dynamic and Semantic Relationships , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[46]  Mohammad H. Mahoor,et al.  DISFA: A Spontaneous Facial Action Intensity Database , 2013, IEEE Transactions on Affective Computing.

[47]  Maja Pantic,et al.  Continuous Pain Intensity Estimation from Facial Expressions , 2012, ISVC.

[48]  Mohammad S. Sorower A Literature Survey on Algorithms for Multi-label Learning , 2010 .

[49]  H. Emrah Tasli,et al.  Deep learning based FACS Action Unit occurrence and intensity estimation , 2015, 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[50]  Rama Chellappa,et al.  Structure-Preserving Sparse Decomposition for Facial Expression Analysis , 2014, IEEE Transactions on Image Processing.

[51]  A. Sklar,et al.  Random variables, distribution functions, and copulas---a personal look backward and forward , 1996 .

[52]  Joel E. Pessa,et al.  Double or bifid zygomaticus major muscle: Anatomy, incidence, and clinical correlation , 1998, Clinical anatomy.

[53]  Fernando Pereira,et al.  Structured Learning with Approximate Inference , 2007, NIPS.

[54]  Daniel S. Messinger,et al.  A framework for automated measurement of the intensity of non-posed Facial Action Units , 2009, 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[55]  Zhi-Hua Zhou,et al.  Multilabel Neural Networks with Applications to Functional Genomics and Text Categorization , 2006, IEEE Transactions on Knowledge and Data Engineering.

[56]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[57]  Shou-De Lin,et al.  A Ranking-based KNN Approach for Multi-Label Classification , 2012, ACML.

[58]  Vladimir Pavlovic,et al.  Variable-state latent conditional random fields for facial expression recognition and action unit detection , 2015, 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[59]  Qiang Ji,et al.  Facial action unit recognition under incomplete data based on multi-label learning with missing labels , 2016, Pattern Recognit..

[60]  Stefanos Zafeiriou,et al.  Facial Action Recognition in 2D and 3D , 2014 .

[61]  Michel F. Valstar,et al.  Learning to combine local models for facial Action Unit detection , 2015, 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[62]  S. Horvath Weighted Network Analysis: Applications in Genomics and Systems Biology , 2011 .

[63]  Jeff G. Schneider,et al.  A Composite Likelihood View for Multi-Label Classification , 2012, AISTATS.

[64]  Klaus Obermayer,et al.  Support vector learning for ordinal regression , 1999 .

[65]  Vladimir Pavlovic,et al.  Copula Ordinal Regression for Joint Estimation of Facial Action Unit Intensity , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[66]  Gang Hua,et al.  Ordinal Regression with Multiple Output CNN for Age Estimation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[67]  Mohammad H. Mahoor,et al.  Facial action unit recognition with sparse representation , 2011, Face and Gesture 2011.

[68]  Vladimir Pavlovic,et al.  Structured Output Ordinal Regression for Dynamic Facial Emotion Intensity Prediction , 2010, ECCV.

[69]  Jeffrey F. Cohn,et al.  Painful data: The UNBC-McMaster shoulder pain expression archive database , 2011, Face and Gesture 2011.

[70]  Lijun Yin,et al.  Static and dynamic 3D facial expression recognition: A comprehensive survey , 2012, Image Vis. Comput..

[71]  Honggang Zhang,et al.  Joint patch and multi-label learning for facial action unit detection , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[72]  Gwen Littlewort,et al.  Automatic Recognition of Facial Actions in Spontaneous Expressions , 2006, J. Multim..

[73]  Ashish Kapoor,et al.  Multimodal affect recognition in learning environments , 2005, ACM Multimedia.

[74]  Simon Lucey,et al.  Investigating Spontaneous Facial Action Recognition through AAM Representations of the Face , 2007 .

[75]  Stefanos Zafeiriou,et al.  A dynamic approach to the recognition of 3D facial expressions and their temporal models , 2011, Face and Gesture 2011.

[76]  Anton Schwaighofer,et al.  Learning Gaussian processes from multiple tasks , 2005, ICML.

[77]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[78]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[79]  Sridha Sridharan,et al.  Automatically Detecting Pain in Video Through Facial Action Units , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[80]  Fernando De la Torre,et al.  Selective Transfer Machine for Personalized Facial Action Unit Detection , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[81]  Andrew McCallum,et al.  Piecewise pseudolikelihood for efficient training of conditional random fields , 2007, ICML '07.

[82]  Peter Robinson,et al.  Cross-dataset learning and person-specific normalisation for automatic Action Unit detection , 2015, 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[83]  Markov Random Field , 2010, Encyclopedia of Machine Learning.

[84]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.