Joint Representation and Estimator Learning for Facial Action Unit Intensity Estimation

Facial action unit (AU) intensity is an index to characterize human expressions. Accurate AU intensity estimation depends on three major elements: image representation, intensity estimator, and supervisory information. Most existing methods learn intensity estimator with fixed image representation, and rely on the availability of fully annotated supervisory information. In this paper, a novel general framework for AU intensity estimation is presented, which differs from traditional estimation methods in two aspects. First, rather than keeping image representation fixed, it simultaneously learns representation and intensity estimator to achieve an optimal solution. Second, it allows incorporating weak supervisory training signal from human knowledge (e.g. feature smoothness, label smoothness, label ranking, and positive label), which makes our model trainable even fully annotated information is not available. More specifically, human knowledge is represented as either soft or hard constraints which are encoded as regularization terms or equality/inequality constraints, respectively. On top of our novel framework, we additionally propose an efficient algorithm for optimization based on Alternating Direction Method of Multipliers (ADMM). Evaluations on two benchmark databases show that our method outperforms competing methods under different ratios of AU intensity annotations, especially for small ratios.

[1]  Daniel S. Messinger,et al.  A framework for automated measurement of the intensity of non-posed Facial Action Units , 2009, 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[2]  Paul Geladi,et al.  Principal Component Analysis , 1987, Comprehensive Chemometrics.

[3]  Don Coppersmith,et al.  Matrix multiplication via arithmetic progressions , 1987, STOC.

[4]  Maja Pantic,et al.  Doubly Sparse Relevance Vector Machine for Continuous Facial Behavior Estimation , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Björn W. Schuller,et al.  DeepCoder: Semi-parametric Variational Autoencoders for Facial Action Unit Intensity Estimation , 2017, ArXiv.

[6]  Maja Pantic,et al.  Latent trees for estimating intensity of Facial Action Units , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Maja Pantic,et al.  Continuous Pain Intensity Estimation from Facial Expressions , 2012, ISVC.

[8]  Joost van de Weijer,et al.  Regularized Multi-Concept MIL for weakly-supervised facial behavior categorization , 2014, BMVC.

[9]  Maja Pantic,et al.  The first facial expression recognition and analysis challenge , 2011, Face and Gesture 2011.

[10]  Qiang Ji,et al.  Feature and label relation modeling for multiple-facial action unit classification and intensity estimation , 2017, Pattern Recognit..

[11]  Mikhail Belkin,et al.  Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples , 2006, J. Mach. Learn. Res..

[12]  Qiang Ji,et al.  Facial Expression Intensity Estimation Using Ordinal Information , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  J. Fleiss,et al.  Intraclass correlations: uses in assessing rater reliability. , 1979, Psychological bulletin.

[14]  A. Blitzer,et al.  Facial anatomy. , 2004, Clinics in dermatology.

[15]  Vladimir Pavlovic,et al.  Deep Structured Learning for Facial Action Unit Intensity Estimation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Qiang Ji,et al.  Bilateral Ordinal Relevance Multi-instance Regression for Facial Action Unit Intensity Estimation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[17]  Vladimir Pavlovic,et al.  Copula Ordinal Regression for Joint Estimation of Facial Action Unit Intensity , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Thore Graepel,et al.  Large Margin Rank Boundaries for Ordinal Regression , 2000 .

[19]  Lijun Yin,et al.  FERA 2015 - second Facial Expression Recognition and Analysis challenge , 2015, 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[20]  Vladimir Pavlovic,et al.  Context-Sensitive Dynamic Ordinal Regression for Intensity Estimation of Facial Action Units , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Wei Liu,et al.  Robust and Scalable Graph-Based Semisupervised Learning , 2012, Proceedings of the IEEE.

[22]  Alexander Zien,et al.  Semi-Supervised Classification by Low Density Separation , 2005, AISTATS.

[23]  Mohamed Chetouani,et al.  Facial Action Unit intensity prediction via Hard Multi-Task Metric Learning for Kernel Regression , 2015, 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[24]  David J. Field,et al.  Sparse coding with an overcomplete basis set: A strategy employed by V1? , 1997, Vision Research.

[25]  Wei Chu,et al.  New approaches to support vector ordinal regression , 2005, ICML.

[26]  Shaun J. Canavan,et al.  BP4D-Spontaneous: a high-resolution spontaneous 3D dynamic facial expression database , 2014, Image Vis. Comput..

[27]  Qiang Ji,et al.  Weakly-Supervised Deep Convolutional Neural Network Learning for Facial Action Unit Intensity Estimation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[28]  Maja Pantic,et al.  Multi-Instance Dynamic Ordinal Random Fields for Weakly Supervised Facial Behavior Analysis , 2018, IEEE Transactions on Image Processing.

[29]  Mohammad H. Mahoor,et al.  Extended DISFA Dataset: Investigating Posed and Spontaneous Facial Expressions , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[30]  Nadeem Ahmad Khan,et al.  Pain Intensity Evaluation through Facial Action Units , 2014, 2014 22nd International Conference on Pattern Recognition.

[31]  Laurens van der Maaten,et al.  Accelerating t-SNE using tree-based algorithms , 2014, J. Mach. Learn. Res..

[32]  Zuheng Ming,et al.  Facial Action Units intensity estimation by the fusion of features with multi-kernel Support Vector Machine , 2015, 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[33]  Florian Steinke,et al.  Semi-supervised Regression using Hessian energy with an application to semi-supervised dimensionality reduction , 2009, NIPS.

[34]  Mohammad H. Mahoor,et al.  Temporal Facial Expression Modeling for Automated Action Unit Intensity Measurement , 2014, 2014 22nd International Conference on Pattern Recognition.

[35]  H. Emrah Tasli,et al.  Deep learning based FACS Action Unit occurrence and intensity estimation , 2015, 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[36]  Mikhail Belkin,et al.  Laplacian Support Vector Machines Trained in the Primal , 2009, J. Mach. Learn. Res..

[37]  Gaurav Sharma,et al.  LOMo: Latent Ordinal Model for Facial Analysis in Videos , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Martha White,et al.  Convex Sparse Coding, Subspace Learning, and Semi-Supervised Extensions , 2011, AAAI.

[39]  Zheng Zhang,et al.  FERA 2017 - Addressing Head Pose in the Third Facial Expression Recognition and Analysis Challenge , 2017, 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017).

[40]  Qiang Ji,et al.  Measuring the intensity of spontaneous facial action units with dynamic Bayesian network , 2015, Pattern Recognit..

[41]  Mohammad H. Mahoor,et al.  DISFA: A Spontaneous Facial Action Intensity Database , 2013, IEEE Transactions on Affective Computing.

[42]  Stefanos Zafeiriou,et al.  Markov Random Field Structures for Facial Action Unit Intensity Estimation , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[43]  Marian Stewart Bartlett,et al.  Weakly supervised pain localization using multiple instance learning , 2013, 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).