Discriminative Attention-based Convolutional Neural Network for 3D Facial Expression Recognition

3D Facial Expression Recognition (FER) is an active research area in computer vision. Although previous methods report promising results, two key issues still remain to be solved. On the one hand, different facial areas contribute unequally to performing various expressions, but most existing methods extract features from the entire 3D surface. On the other hand, the difference between expressions varies, while previous methods generally treat different emotions equally, making some of them extremely hard to be distinguished. To solve these problems, we propose a novel approach for 3D FER, namely Discriminative Attention-based Convolution Neural Network (DA-CNN), to generate more comprehensive expression related representations. DA-CNN introduces an attention module to the CNN models, which helps the deep model selectively focus on emotional salient regions in a learnable way. Furthermore, a novel loss named Dimensional Distribution (DD) loss is proposed to model the inter-expression relationship. Supervised by DD loss, DA-CNN can generate more discriminative expression representation. Extensive experiments are conducted on BU-3DFE dataset, and the results show that DA-CNN achieves significant improvement over the state-of-the-art.

[1]  Michael G. Strintzis,et al.  Bilinear Models for 3-D Face and Facial Expression Recognition , 2008, IEEE Transactions on Information Forensics and Security.

[2]  Liming Chen,et al.  Fully automatic 3D facial expression recognition using a region-based approach , 2011, J-HGBU '11.

[3]  Ping Liu,et al.  Facial Expression Recognition via a Boosted Deep Belief Network , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Yunhong Wang,et al.  Texture and Geometry Scattering Representation-Based Facial Expression Recognition in 2D+3D Videos , 2018, ACM Trans. Multim. Comput. Commun. Appl..

[5]  Liming Chen,et al.  Unsupervised Domain Adaptation with Regularized Optimal Transport for Multimodal 2D+3D Facial Expression Recognition , 2018, 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018).

[6]  Liming Chen,et al.  Accurate Facial Parts Localization and Deep Learning for 3D Facial Expression Recognition , 2018, 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018).

[7]  Hasan Demirel,et al.  Facial Expression Recognition Using 3D Facial Feature Distances , 2007, ICIAR.

[8]  Arman Savran,et al.  Comparative evaluation of 3D vs. 2D modality for automatic detection of facial action units , 2012, Pattern Recognit..

[9]  Liming Chen,et al.  3D facial expression recognition via multiple kernel learning of Multi-Scale Local Normal Patterns , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[10]  Liming Chen,et al.  Automatic 3D facial expression recognition using geometric scattering representation , 2015, 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[11]  Holger Hoffmann,et al.  Mapping discrete emotions into the dimensional space: An empirical approach , 2012, 2012 IEEE International Conference on Systems, Man, and Cybernetics (SMC).

[12]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[14]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  R. Adolphs Recognizing emotion from facial expressions: psychological and neurological mechanisms. , 2002, Behavioral and cognitive neuroscience reviews.

[16]  Shiguang Shan,et al.  AU-inspired Deep Networks for Facial Expression Feature Learning , 2015, Neurocomputing.

[17]  Xiaogang Wang,et al.  Residual Attention Network for Image Classification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[19]  Alberto Del Bimbo,et al.  A Set of Selected SIFT Features for 3D Facial Expression Recognition , 2010, 2010 20th International Conference on Pattern Recognition.

[20]  Wei Zeng,et al.  An automatic 3D expression recognition framework based on sparse representation of conformal images , 2013, 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[21]  Liming Chen,et al.  Muscular Movement Model-Based Automatic 3D/4D Facial Expression Recognition , 2015, IEEE Transactions on Multimedia.

[22]  Jun Wang,et al.  A 3D facial expression database for facial behavior research , 2006, 7th International Conference on Automatic Face and Gesture Recognition (FGR06).

[23]  Jia Deng,et al.  Stacked Hourglass Networks for Human Pose Estimation , 2016, ECCV.

[24]  Xiaoou Tang,et al.  Automatic facial expression recognition on a single 3D face by exploring shape deformation , 2009, ACM Multimedia.

[25]  Liming Chen,et al.  Fast and Light Manifold CNN based 3D Facial Expression Recognition across Pose Variations , 2018, ACM Multimedia.

[26]  Liming Chen,et al.  Author manuscript, published in "Workshop 3D Face Biometrics, IEEE Automatic Facial and Gesture Recognition, Shanghai: China (2013)" Fully Automatic 3D Facial Expression Recognition using Differential Mean Curvature Maps and Histograms of Oriented Gradien , 2013 .

[27]  Liming Chen,et al.  3D Facial Expression Recognition Based on Histograms of Surface Differential Quantities , 2011, ACIVS.

[28]  Thomas S. Huang,et al.  3D facial expression recognition based on properties of line segments connecting facial feature points , 2008, 2008 8th IEEE International Conference on Automatic Face & Gesture Recognition.

[29]  Abd El Rahman Shabayek,et al.  Facial Expression Recognition via Joint Deep Learning of RGB-Depth Map Latent Representations , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[30]  Yunhong Wang,et al.  Facial Expression Synthesis by U-Net Conditional Generative Adversarial Networks , 2018, ICMR.

[31]  Abhishek Das,et al.  Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[32]  Wenming Zheng,et al.  Multi-View Facial Expression Recognition Based on Group Sparse Reduced-Rank Regression , 2014, IEEE Transactions on Affective Computing.

[33]  Jian Sun,et al.  Multimodal 2D+3D Facial Expression Recognition With Deep Fusion Convolutional Neural Network , 2017, IEEE Transactions on Multimedia.

[34]  Cha Zhang,et al.  Image based Static Facial Expression Recognition with Multiple Deep Network Learning , 2015, ICMI.

[35]  Stefano Berretti,et al.  Shape analysis of local facial patches for 3D facial expression recognition , 2011, Pattern Recognit..

[36]  Jun Wang,et al.  3D Facial Expression Recognition Based on Primitive Surface Feature Distribution , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[37]  Jian Sun,et al.  Identity Mappings in Deep Residual Networks , 2016, ECCV.

[38]  Xi Zhao,et al.  An efficient multimodal 2D + 3D feature-based approach to automatic facial expression recognition , 2015, Comput. Vis. Image Underst..

[39]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[40]  Emmanuel Dellandréa,et al.  Automatic 3D Facial Expression Recognition Based on a Bayesian Belief Net and a Statistical Facial Feature Model , 2010, 2010 20th International Conference on Pattern Recognition.

[41]  Lijun Yin,et al.  CNN based 3D facial expression recognition using masking and landmark features , 2017, 2017 Seventh International Conference on Affective Computing and Intelligent Interaction (ACII).

[42]  Yunhong Wang,et al.  Bipolar Disorder Recognition via Multi-scale Discriminative Audio Temporal Representation , 2018, AVEC@MM.

[43]  Shaogang Gong,et al.  Facial expression recognition based on Local Binary Patterns: A comprehensive study , 2009, Image Vis. Comput..

[44]  Liming Chen,et al.  Deep Representation of Facial Geometric and Photometric Attributes for Automatic 3D Facial Expression Recognition , 2015, ArXiv.