Deep Region and Multi-label Learning for Facial Action Unit Detection

Region learning (RL) and multi-label learning (ML) have recently attracted increasing attentions in the field of facial Action Unit (AU) detection. Knowing that AUs are active on sparse facial regions, RL aims to identify these regions for a better specificity. On the other hand, a strong statistical evidence of AU correlations suggests that ML is a natural way to model the detection task. In this paper, we propose Deep Region and Multi-label Learning (DRML), a unified deep network that simultaneously addresses these two problems. One crucial aspect in DRML is a novel region layer that uses feed-forward functions to induce important facial regions, forcing the learned weights to capture structural information of the face. Our region layer serves as an alternative design between locally connected layers (i.e., confined kernels to individual pixels) and conventional convolution layers (i.e., shared kernels across an entire image). Unlike previous studies that solve RL and ML alternately, DRML by construction addresses both problems, allowing the two seemingly irrelevant problems to interact more directly. The complete network is end-to-end trainable, and automatically learns representations robust to variations inherent within a local region. Experiments on BP4D and DISFA benchmarks show that DRML performs the highest average F1-score and AUC within and across datasets in comparison with alternative methods.

[1]  Xu-Ying Liu,et al.  Towards Class-Imbalance Aware Multi-Label Learning , 2015, IEEE Transactions on Cybernetics.

[2]  Rama Chellappa,et al.  Structure-Preserving Sparse Decomposition for Facial Expression Analysis , 2014, IEEE Transactions on Image Processing.

[3]  Hod Lipson,et al.  Understanding Neural Networks Through Deep Visualization , 2015, ArXiv.

[4]  Ming Yang,et al.  DeepFace: Closing the Gap to Human-Level Performance in Face Verification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Simon Lucey,et al.  Investigating Spontaneous Facial Action Recognition through AAM Representations of the Face , 2007 .

[6]  Sungyoung Lee,et al.  Human Facial Expression Recognition Using Stepwise Linear Discriminant Analysis and Hidden Conditional Random Fields , 2015, IEEE Transactions on Image Processing.

[7]  Fernando De la Torre,et al.  Facial Action Unit Event Detection by Cascade of Tasks , 2013, 2013 IEEE International Conference on Computer Vision.

[8]  Honggang Zhang,et al.  Joint patch and multi-label learning for facial action unit detection , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[11]  Michel F. Valstar,et al.  Learning to combine local models for facial Action Unit detection , 2015, 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[12]  Maja Pantic,et al.  Action unit detection using sparse appearance descriptors in space-time video volumes , 2011, Face and Gesture 2011.

[13]  Gwen Littlewort,et al.  Automatic Recognition of Facial Actions in Spontaneous Expressions , 2006, J. Multim..

[14]  Qingshan Liu,et al.  Learning Multiscale Active Facial Patches for Expression Analysis , 2015, IEEE Transactions on Cybernetics.

[15]  Ivor W. Tsang,et al.  Feature Disentangling Machine - A Novel Approach of Feature Selection and Disentangling in Facial Expression Analysis , 2014, ECCV.

[16]  Zhang Xiong,et al.  Confidence Preserving Machine for Facial Action Unit Detection , 2015, IEEE Transactions on Image Processing.

[17]  Ping Liu,et al.  Facial Expression Recognition via a Boosted Deep Belief Network , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Takeo Kanade,et al.  The Extended Cohn-Kanade Dataset (CK+): A complete dataset for action unit and emotion-specified expression , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[19]  Mohammad H. Mahoor,et al.  DISFA: A Spontaneous Facial Action Intensity Database , 2013, IEEE Transactions on Affective Computing.

[20]  Fernando De la Torre,et al.  Dynamic Cascades with Bidirectional Bootstrapping for Action Unit Detection in Spontaneous Facial Behavior , 2011, IEEE Transactions on Affective Computing.

[21]  Maja Pantic,et al.  Multi-conditional Latent Variable Model for Joint Facial Action Unit Detection , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[22]  Shiguang Shan,et al.  AU-aware Deep Networks for facial expression recognition , 2013, 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[23]  Fernando De la Torre,et al.  Selective Transfer Machine for Personalized Facial Action Unit Detection , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Honglak Lee,et al.  Learning hierarchical representations for face verification with convolutional deep belief networks , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[26]  Andrew Zisserman,et al.  Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[27]  Yangqing Jia,et al.  Deep Convolutional Ranking for Multilabel Image Annotation , 2013, ICLR.

[28]  Shaun J. Canavan,et al.  BP4D-Spontaneous: a high-resolution spontaneous 3D dynamic facial expression database , 2014, Image Vis. Comput..

[29]  Yong Tao,et al.  Compound facial expressions of emotion , 2014, Proceedings of the National Academy of Sciences.

[30]  Jun Li,et al.  Robust Representation and Recognition of Facial Emotions Using Extreme Sparse Learning , 2015, IEEE Transactions on Image Processing.

[31]  Mohammad H. Mahoor,et al.  Task-dependent multi-task multiple kernel learning for facial action unit detection , 2016, Pattern Recognit..

[32]  Ming Yang,et al.  Web-scale training for face identification , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Pascal Vincent,et al.  Disentangling Factors of Variation for Facial Expression Recognition , 2012, ECCV.

[34]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[35]  Qiang Ji,et al.  Learning Bayesian Networks with qualitative constraints , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[36]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[37]  Gwen Littlewort,et al.  Dynamics of Facial Expression Extracted Automatically from Video , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[38]  Dit-Yan Yeung,et al.  Learning a Deep Compact Image Representation for Visual Tracking , 2013, NIPS.

[39]  Qiang Ji,et al.  Capturing Global Semantic Relationships for Facial Action Unit Recognition , 2013, 2013 IEEE International Conference on Computer Vision.