Multi-view facial action unit detection via DenseNets and CapsNets

Though the standard convolutional neural networks (CNNs) have been proposed to increase the robustness of facial action unit (AU) detection regarding pose variations, it is hard to enhance detection performance because the standard CNNs are not robust enough to affine transformation. To address this issue, two novel architectures termed as AUCaps and AUCaps++ are proposed for multi-view and multi-label facial AU detection in this work. In these two architectures, one or more dense blocks and one capsule networks (CapsNets) are stacked. Specifically, The dense blocks prefixed before CapsNets are used to learn more discriminative high-level AU features, and the CapsNets is exploited to learn more view-invariant AU features. Moreover, the capsule types and digit capsule dimension are optimized to avoid the computation and storage burden caused by the dynamic routing in standard CapsNets. Because the AUCaps and AUCaps++ are trained by jointly optimizing multi-label loss of AU and reconstruction loss of viewpoint image, the proposed method could achieve high F1 score and learn human face roughly in the reconstruction images over different AUs. Numerical results of within-dataset and cross-dataset show that the average F1 scores of the proposed method outperform the competitors using hand-crafted features or deep learning features by a big margin on two public datasets.

[1]  Jianfei Cai,et al.  Deep Adaptive Attention for Joint Facial Action Unit Detection and Face Alignment , 2018, ECCV.

[2]  Premkumar Natarajan,et al.  CapsuleGAN: Generative Adversarial Capsule Network , 2018, ECCV Workshops.

[3]  Yang Jin,et al.  Capsule Network Performance on Complex Data , 2017, ArXiv.

[4]  Peter Robinson,et al.  Cross-dataset learning and person-specific normalisation for automatic Action Unit detection , 2015, 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[5]  Olga R. P. Bellon,et al.  AUMPNet: Simultaneous Action Units Detection and Intensity Estimation on Multipose Facial Images Using a Single Convolutional Neural Network , 2017, 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017).

[6]  Abhinav Dhall,et al.  Dense and Diverse Capsule Networks: Making the Capsules Learn Better , 2018, ArXiv.

[7]  Baoyuan Wu,et al.  Joint Representation and Estimator Learning for Facial Action Unit Intensity Estimation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Yong Zhang,et al.  Context-Aware Feature and Label Fusion for Facial Action Unit Intensity Estimation With Partially Labeled Data , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[9]  H. Emrah Tasli,et al.  Deep learning based FACS Action Unit occurrence and intensity estimation , 2015, 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[10]  Maja Pantic,et al.  A Dynamic Texture-Based Approach to Recognition of Facial Actions and Their Temporal Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Konstantinos N. Plataniotis,et al.  Brain Tumor Type Classification via Capsule Networks , 2018, 2018 25th IEEE International Conference on Image Processing (ICIP).

[12]  Zheng Zhang,et al.  FERA 2017 - Addressing Head Pose in the Third Facial Expression Recognition and Analysis Challenge , 2017, 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017).

[13]  Maja Pantic,et al.  Meta-Analysis of the First Facial Expression Recognition Challenge , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[14]  Michel F. Valstar,et al.  Cascaded Continuous Regression for Real-time Incremental Face Tracking , 2016, ECCV.

[15]  Qiang Ji,et al.  Data-Free Prior Model for Facial Action Unit Recognition , 2013, IEEE Transactions on Affective Computing.

[16]  Geoffrey E. Hinton,et al.  Matrix capsules with EM routing , 2018, ICLR.

[17]  Aleix M. Martínez,et al.  EmotioNet: An Accurate, Real-Time Algorithm for the Automatic Annotation of a Million Facial Expressions in the Wild , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Tong Zhang,et al.  View-Independent Facial Action Unit Detection , 2017, 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017).

[19]  Takeo Kanade,et al.  The Extended Cohn-Kanade Dataset (CK+): A complete dataset for action unit and emotion-specified expression , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[20]  Nagashri N. Lakshminarayana,et al.  Learning Guided Attention Masks for Facial Action Unit Recognition , 2020, 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020).

[21]  Jeffrey F. Cohn,et al.  FACSCaps: Pose-Independent Facial Action Coding with Capsules , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[22]  Timothy F. Cootes,et al.  Active Appearance Models , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[23]  Honggang Zhang,et al.  Deep Region and Multi-label Learning for Facial Action Unit Detection , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Jianfei Yang,et al.  Suppressing Uncertainties for Large-Scale Facial Expression Recognition , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Shangfei Wang,et al.  Weakly Supervised Facial Action Unit Recognition Through Adversarial Training , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[26]  Xiaoyan Zhu,et al.  Sentiment Analysis by Capsules , 2018, WWW.

[27]  Dongliang Li,et al.  Multi View Facial Action Unit Detection Based on CNN and BLSTM-RNN , 2017, 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017).

[28]  Jean-Philippe Thiran,et al.  Discriminant multi-label manifold embedding for facial Action Unit detection , 2015, 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[29]  Shaun J. Canavan,et al.  BP4D-Spontaneous: a high-resolution spontaneous 3D dynamic facial expression database , 2014, Image Vis. Comput..

[30]  Geoffrey E. Hinton,et al.  Dynamic Routing Between Capsules , 2017, NIPS.

[31]  Joost van de Weijer,et al.  From Emotions to Action Units with Hidden and Semi-Hidden-Task Learning , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[32]  Qiang Ji,et al.  Capturing Global Semantic Relationships for Facial Action Unit Recognition , 2013, 2013 IEEE International Conference on Computer Vision.

[33]  P. Ekman,et al.  What the face reveals : basic and applied studies of spontaneous expression using the facial action coding system (FACS) , 2005 .

[34]  Qiang Ji,et al.  Bilateral Ordinal Relevance Multi-instance Regression for Facial Action Unit Intensity Estimation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[35]  Fernando De la Torre,et al.  Learning Spatial and Temporal Cues for Multi-Label Facial Action Unit Detection , 2017, 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017).

[36]  Maja Pantic,et al.  Multi-conditional Latent Variable Model for Joint Facial Action Unit Detection , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[37]  Qin Jin,et al.  Facial Action Units Detection with Multi-Features and -AUs Fusion , 2017, 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017).

[38]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Weidong Min,et al.  Driver Yawning Detection Based on Subtle Facial Action Recognition , 2021, IEEE Transactions on Multimedia.

[40]  Michel F. Valstar,et al.  Local Gabor Binary Patterns from Three Orthogonal Planes for Automatic Facial Expression Recognition , 2013, 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction.

[41]  Qiang Ji,et al.  Deep Facial Action Unit Recognition from Partially Labeled Data , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[42]  Shaun J. Canavan,et al.  Multimodal Spontaneous Emotion Corpus for Human Behavior Analysis , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  András Lörincz,et al.  Deep Learning for Facial Action Unit Detection Under Large Head Poses , 2016, ECCV Workshops.

[44]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[45]  Maja Pantic,et al.  Web-based database for facial expression analysis , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[46]  Lionel Prevost,et al.  Facial Action Recognition Combining Heterogeneous Features via Multikernel Learning , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[47]  Honggang Zhang,et al.  Joint patch and multi-label learning for facial action unit detection , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Changsheng Xu,et al.  Joint Pose and Expression Modeling for Facial Expression Recognition , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[49]  J. Russell,et al.  An approach to environmental psychology , 1974 .

[50]  Junmo Kim,et al.  Joint Fine-Tuning in Deep Neural Networks for Facial Expression Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[51]  Yidong Li,et al.  Fine-Grained Facial Expression Recognition in the Wild , 2021, IEEE Transactions on Information Forensics and Security.