Orthogonalization-Guided Feature Fusion Network for Multimodal 2D+3D Facial Expression Recognition

As 2D and 3D data present different views of the same face, the features extracted from them can be both complementary and redundant. In this paper, we present a novel and efficient orthogonalization-guided feature fusion network, namely OGF2Net, to fuse the features extracted from 2D and 3D faces for facial expression recognition. While 2D texture maps are fed into a 2D feature extraction pipeline (FE2DNet), the attribute maps generated from 3D data are concatenated as input of the 3D feature extraction pipeline (FE3DNet). The two networks are separately trained at the first stage and frozen in the second stage for late feature fusion, which can well address the unavailability of a large number of 3D+2D face pairs. To reduce the redundancies among features extracted from 2D and 3D streams, we design an orthogonal loss-guided feature fusion network to orthogonalize the features before fusing them. Experimental results show that the proposed method significantly outperforms the state-of-the-art algorithms on both the BU-3DFE and Bosphorus databases. While accuracies as high as 89.05% (P1 protocol) and 89.07% (P2 protocol) are achieved on the BU-3DFE database, an accuracy of 89.28% is achieved on the Bosphorus database. The complexity analysis also suggests that our approach achieves a higher processing speed while simultaneously requiring lower memory costs.

[1]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[2]  Abhishek Das,et al.  Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[3]  Liming Chen,et al.  Fast and Light Manifold CNN based 3D Facial Expression Recognition across Pose Variations , 2018, ACM Multimedia.

[4]  Stefano Berretti,et al.  Shape analysis of local facial patches for 3D facial expression recognition , 2011, Pattern Recognit..

[5]  Liming Chen,et al.  Accurate Facial Parts Localization and Deep Learning for 3D Facial Expression Recognition , 2018, 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018).

[6]  Shan Li,et al.  Deep Facial Expression Recognition: A Survey , 2018, IEEE Transactions on Affective Computing.

[7]  Lijun Yin,et al.  A high-resolution 3D dynamic facial expression database , 2008, 2008 8th IEEE International Conference on Automatic Face & Gesture Recognition.

[8]  Takeo Kanade,et al.  The Extended Cohn-Kanade Dataset (CK+): A complete dataset for action unit and emotion-specified expression , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[9]  Junmo Kim,et al.  Joint Fine-Tuning in Deep Neural Networks for Facial Expression Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[10]  Fei Chen,et al.  A Natural Visible and Infrared Facial Expression Database for Expression Recognition and Emotion Inference , 2010, IEEE Transactions on Multimedia.

[11]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[12]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[13]  Jun Wang,et al.  A 3D facial expression database for facial behavior research , 2006, 7th International Conference on Automatic Face and Gesture Recognition (FGR06).

[14]  Yong Du,et al.  Facial Expression Recognition Based on Deep Evolutional Spatial-Temporal Networks , 2017, IEEE Transactions on Image Processing.

[15]  Liming Chen,et al.  Deep Representation of Facial Geometric and Photometric Attributes for Automatic 3D Facial Expression Recognition , 2015, ArXiv.

[16]  Li Bai,et al.  Cosine Similarity Metric Learning for Face Verification , 2010, ACCV.

[17]  Ke Chen,et al.  Identity-aware convolutional neural networks for facial expression recognition , 2010 .

[18]  Jian Cheng,et al.  Additive Margin Softmax for Face Verification , 2018, IEEE Signal Processing Letters.

[19]  Jian Sun,et al.  Multimodal 2D+3D Facial Expression Recognition With Deep Fusion Convolutional Neural Network , 2017, IEEE Transactions on Multimedia.

[20]  Xiaoou Tang,et al.  Automatic facial expression recognition on a single 3D face by exploring shape deformation , 2009, ACM Multimedia.

[21]  Maja Pantic,et al.  Web-based database for facial expression analysis , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[22]  Michael G. Strintzis,et al.  Bilinear Models for 3-D Face and Facial Expression Recognition , 2008, IEEE Transactions on Information Forensics and Security.

[23]  Syed Zulqarnain Gilani,et al.  Learning from Millions of 3 D Scans for Large-scale 3 D Face Recognition , 2018 .

[24]  Shiguang Shan,et al.  Local Relationship Learning With Person-Specific Shape Regularization for Facial Action Unit Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[26]  Alberto Del Bimbo,et al.  A Set of Selected SIFT Features for 3D Facial Expression Recognition , 2010, 2010 20th International Conference on Pattern Recognition.

[27]  Ying Li,et al.  Robust Symbolic Dual-View Facial Expression Recognition With Skin Wrinkles: Local Versus Global Approach , 2010, IEEE Transactions on Multimedia.

[28]  Liming Chen,et al.  Unsupervised Domain Adaptation with Regularized Optimal Transport for Multimodal 2D+3D Facial Expression Recognition , 2018, 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018).

[29]  Syed Zulqarnain Gilani,et al.  Learning from Millions of 3D Scans for Large-Scale 3D Face Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[30]  Ping Liu,et al.  Identity-Aware Convolutional Neural Network for Facial Expression Recognition , 2017, 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017).

[31]  Zhiyuan Li,et al.  Island Loss for Learning Discriminative Features in Facial Expression Recognition , 2017, 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018).

[32]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Mohan M. Trivedi,et al.  Face Expression Recognition by Cross Modal Data Association , 2013, IEEE Transactions on Multimedia.

[34]  Di Huang,et al.  Discriminative Attention-based Convolutional Neural Network for 3D Facial Expression Recognition , 2019, 2019 14th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2019).

[35]  Xi Zhao,et al.  An efficient multimodal 2D + 3D feature-based approach to automatic facial expression recognition , 2015, Comput. Vis. Image Underst..

[36]  Arman Savran,et al.  Bosphorus Database for 3D Face Analysis , 2008, BIOID.

[37]  Lijun Yin,et al.  CNN based 3D facial expression recognition using masking and landmark features , 2017, 2017 Seventh International Conference on Affective Computing and Intelligent Interaction (ACII).

[38]  Marcus Liwicki,et al.  DeXpression: Deep Convolutional Neural Network for Expression Recognition , 2015, ArXiv.

[39]  Simon Osindero,et al.  Conditional Generative Adversarial Nets , 2014, ArXiv.

[40]  Christopher Joseph Pal,et al.  EmoNets: Multimodal deep learning approaches for emotion recognition in video , 2015, Journal on Multimodal User Interfaces.

[41]  Liming Chen,et al.  Muscular Movement Model Based Automatic 3D Facial Expression Recognition , 2015, MMM.

[42]  Baoqing Li,et al.  Facial Expression Recognition From Image Sequence Based on LBP and Taylor Expansion , 2017, IEEE Access.

[43]  Liming Chen,et al.  Automatic 3D facial expression recognition using geometric scattering representation , 2015, 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[44]  Shengcai Liao,et al.  Learning Face Representation from Scratch , 2014, ArXiv.

[45]  Zhaoyu Wang,et al.  Analyses of a Multimodal Spontaneous Facial Expression Database , 2013, IEEE Transactions on Affective Computing.