Multi-conditional Latent Variable Model for Joint Facial Action Unit Detection

We propose a novel multi-conditional latent variable model for simultaneous facial feature fusion and detection of facial action units. In our approach we exploit the structure-discovery capabilities of generative models such as Gaussian processes, and the discriminative power of classifiers such as logistic function. This leads to superior performance compared to existing classifiers for the target task that exploit either the discriminative or generative property, but not both. The model learning is performed via an efficient, newly proposed Bayesian learning strategy based on Monte Carlo sampling. Consequently, the learned model is robust to data overfitting, regardless of the number of both input features and jointly estimated facial action units. Extensive qualitative and quantitative experimental evaluations are performed on three publicly available datasets (CK+, Shoulder-pain and DISFA). We show that the proposed model outperforms the state-of-the-art methods for the target task on (i) feature fusion, and (ii) multiple facial action unit detection.

[1]  Gwen Littlewort,et al.  Recognizing facial expression: machine learning and application to spontaneous behavior , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[2]  Qiang Ji,et al.  Facial Action Unit Recognition by Exploiting Their Dynamic and Semantic Relationships , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Vladimir Pavlovic,et al.  Kernel Conditional Ordinal Random Fields for Temporal Segmentation of Facial Action Units , 2012, ECCV Workshops.

[4]  Mohammad H. Mahoor,et al.  DISFA: A Spontaneous Facial Action Intensity Database , 2013, IEEE Transactions on Affective Computing.

[5]  U. Feige,et al.  Spectral Graph Theory , 2015 .

[6]  Mohammad H. Mahoor,et al.  Simultaneous Detection of Multiple Facial Action Units via Hierarchical Task Structure Learning , 2014, 2014 22nd International Conference on Pattern Recognition.

[7]  Christopher Joseph Pal,et al.  Multi-Conditional Learning: Generative/Discriminative Training for Clustering and Classification , 2006, AAAI.

[8]  Joaquin Quiñonero Candela,et al.  Local distance preservation in the GP-LVM through back constraints , 2006, ICML.

[9]  Matti Pietikäinen,et al.  Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Maja Pantic,et al.  A Dynamic Texture-Based Approach to Recognition of Facial Actions and Their Temporal Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Mohammad S. Sorower A Literature Survey on Algorithms for Multi-label Learning , 2010 .

[12]  P. Ekman,et al.  Handbook of methods in nonverbal behavior research , 1982 .

[13]  Katherine B. Martin,et al.  Facial Action Coding System , 2015 .

[14]  Cristian Sminchisescu,et al.  Supervised Spectral Latent Variable Models , 2009, AISTATS.

[15]  Yiming Yang,et al.  Flexible latent variable models for multi-task learning , 2008, Machine Learning.

[16]  Neil D. Lawrence,et al.  Transferring Nonlinear Representations using Gaussian Processes with a Shared Latent Space , 2008 .

[17]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[18]  Sridha Sridharan,et al.  Automatically Detecting Pain in Video Through Facial Action Units , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[19]  Fernando De la Torre,et al.  Selective Transfer Machine for Personalized Facial Action Unit Detection , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  N. Ambady,et al.  Thin slices of expressive behavior as predictors of interpersonal consequences: A meta-analysis. , 1992 .

[21]  Massimiliano Pontil,et al.  Regularized multi--task learning , 2004, KDD.

[22]  Neil D. Lawrence,et al.  Manifold Relevance Determination , 2012, ICML.

[23]  Zhi-Hua Zhou,et al.  ML-KNN: A lazy learning approach to multi-label learning , 2007, Pattern Recognit..

[24]  Qiang Ji,et al.  Multiple-Facial Action Unit Recognition by Shared Feature Learning and Semantic Relation Modeling , 2014, 2014 22nd International Conference on Pattern Recognition.

[25]  Qiang Ji,et al.  Capturing Global Semantic Relationships for Facial Action Unit Recognition , 2013, 2013 IEEE International Conference on Computer Vision.

[26]  Lionel Prevost,et al.  Facial Action Recognition Combining Heterogeneous Features via Multikernel Learning , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[27]  Mohammad H. Mahoor,et al.  Facial action unit recognition with sparse representation , 2011, Face and Gesture 2011.

[28]  Jeffrey F. Cohn,et al.  Painful data: The UNBC-McMaster shoulder pain expression archive database , 2011, Face and Gesture 2011.

[29]  Hujun Bao,et al.  Laplacian Regularized Gaussian Mixture Model for Data Clustering , 2011, IEEE Transactions on Knowledge and Data Engineering.

[30]  Honggang Zhang,et al.  Joint patch and multi-label learning for facial action unit detection , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Grigorios Tsoumakas,et al.  Multi-Label Classification: An Overview , 2007, Int. J. Data Warehous. Min..

[32]  Takeo Kanade,et al.  The Extended Cohn-Kanade Dataset (CK+): A complete dataset for action unit and emotion-specified expression , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[33]  Simon Baker,et al.  Active Appearance Models Revisited , 2004, International Journal of Computer Vision.

[34]  Mohammad H. Mahoor,et al.  A lp-norm MTMKL framework for simultaneous detection of multiple facial action units , 2014, IEEE Winter Conference on Applications of Computer Vision.

[35]  Zhi-Hua Zhou,et al.  Multilabel Neural Networks with Applications to Functional Genomics and Text Categorization , 2006, IEEE Transactions on Knowledge and Data Engineering.

[36]  Maja Pantic,et al.  Discriminative Shared Gaussian Processes for Multiview and View-Invariant Facial Expression Recognition , 2015, IEEE Transactions on Image Processing.