Multimodal 2D+3D Facial Expression Recognition With Deep Fusion Convolutional Neural Network

This paper presents a novel and efficient deep fusion convolutional neural network (DF-CNN) for multimodal 2D+3D facial expression recognition (FER). DF-CNN comprises a feature extraction subnet, a feature fusion subnet, and a softmax layer. In particular, each textured three-dimensional (3D) face scan is represented as six types of 2D facial attribute maps (i.e., geometry map, three normal maps, curvature map, and texture map), all of which are jointly fed into DF-CNN for feature learning and fusion learning, resulting in a highly concentrated facial representation (32-dimensional). Expression prediction is performed by two ways: 1) learning linear support vector machine classifiers using the 32-dimensional fused deep features, or 2) directly performing softmax prediction using the six-dimensional expression probability vectors. Different from existing 3D FER methods, DF-CNN combines feature learning and fusion learning into a single end-to-end training framework. To demonstrate the effectiveness of DF-CNN, we conducted comprehensive experiments to compare the performance of DF-CNN with handcrafted features, pre-trained deep features, fine-tuned deep features, and state-of-the-art methods on three 3D face datasets (i.e., BU-3DFE Subset I, BU-3DFE Subset II, and Bosphorus Subset). In all cases, DF-CNN consistently achieved the best results. To the best of our knowledge, this is the first work of introducing deep CNN to 3D FER and deep learning-based feature-level fusion for multimodal 2D+3D FER.

[1]  Chung-Hsien Wu,et al.  Speaking Effect Removal on Emotion Recognition From Facial Expressions Based on Eigenface Conversion , 2013, IEEE Transactions on Multimedia.

[2]  Ioannis A. Kakadiaris,et al.  3D facial expression recognition: A perspective on promises and challenges , 2011, Face and Gesture 2011.

[3]  Hasan Demirel,et al.  Facial Expression Recognition Using 3D Facial Feature Distances , 2007, ICIAR.

[4]  Stefano Berretti,et al.  Shape analysis of local facial patches for 3D facial expression recognition , 2011, Pattern Recognit..

[5]  Bin Chen,et al.  Emotion Recognition in Text for 3-D Facial Expression Rendering , 2010, IEEE Transactions on Multimedia.

[6]  Jean Meunier,et al.  Prototype-Based Modeling for Facial Expression Analysis , 2014, IEEE Transactions on Multimedia.

[7]  Mohammed Bennamoun,et al.  Automatic 3D Face Detection, Normalization and Recognition , 2006, Third International Symposium on 3D Data Processing, Visualization, and Transmission (3DPVT'06).

[8]  Sotiris Malassiotis,et al.  Real-time 2D+3D facial action and expression recognition , 2010, Pattern Recognit..

[9]  Emmanuel Dellandréa,et al.  Automatic 3D Facial Expression Recognition Based on a Bayesian Belief Net and a Statistical Facial Feature Model , 2010, 2010 20th International Conference on Pattern Recognition.

[10]  Zhaoyu Wang,et al.  Analyses of a Multimodal Spontaneous Facial Expression Database , 2013, IEEE Transactions on Affective Computing.

[11]  Hans-Peter Seidel,et al.  A Generic Framework for Efficient 2-D and 3-D Facial Expression Analogy , 2007, IEEE Transactions on Multimedia.

[12]  Mohammed Yeasin,et al.  Recognition of facial expressions and measurement of levels of interest from video , 2006, IEEE Transactions on Multimedia.

[13]  Pascal Vincent,et al.  Disentangling Factors of Variation for Facial Expression Recognition , 2012, ECCV.

[14]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[15]  Trevor Darrell,et al.  DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[16]  Victoria Interrante,et al.  A novel cubic-order algorithm for approximating principal direction vectors , 2004, TOGS.

[17]  Wei Zeng,et al.  An automatic 3D expression recognition framework based on sparse representation of conformal images , 2013, 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[18]  Thomas S. Huang,et al.  Do Deep Neural Networks Learn Facial Action Units When Doing Expression Recognition? , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[19]  Liming Chen,et al.  Muscular Movement Model Based Automatic 3D Facial Expression Recognition , 2015, MMM.

[20]  Liming Chen,et al.  Muscular Movement Model-Based Automatic 3D/4D Facial Expression Recognition , 2015, IEEE Transactions on Multimedia.

[21]  Liming Chen,et al.  A group of facial normal descriptors for recognizing 3D identical twins , 2012, 2012 IEEE Fifth International Conference on Biometrics: Theory, Applications and Systems (BTAS).

[22]  Stefan Carlsson,et al.  CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[23]  Maja Pantic,et al.  Automatic Analysis of Facial Expressions: The State of the Art , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[24]  Andrea Vedaldi,et al.  Vlfeat: an open and portable library of computer vision algorithms , 2010, ACM Multimedia.

[25]  Cha Zhang,et al.  Image based Static Facial Expression Recognition with Multiple Deep Network Learning , 2015, ICMI.

[26]  Ioannis A. Kakadiaris,et al.  Expressive Maps for 3D Facial Expression Recognition , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[27]  Jun Wang,et al.  3D Facial Expression Recognition Based on Primitive Surface Feature Distribution , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[28]  Anil K. Jain,et al.  Segmentation and Classification of Range Images , 1987, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Shiguang Shan,et al.  AU-inspired Deep Networks for Facial Expression Feature Learning , 2015, Neurocomputing.

[30]  Xi Zhao,et al.  An efficient multimodal 2D + 3D feature-based approach to automatic facial expression recognition , 2015, Comput. Vis. Image Underst..

[31]  Jean Meunier,et al.  Emotion recognition using dynamic grid-based HoG features , 2011, Face and Gesture 2011.

[32]  Arman Savran,et al.  Facial action unit detection: 3D versus 2D modality , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[33]  Tamás D. Gedeon,et al.  Collecting Large, Richly Annotated Facial-Expression Databases from Movies , 2012, IEEE MultiMedia.

[34]  Fernando De la Torre,et al.  Selective Transfer Machine for Personalized Facial Expression Analysis , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Tamás D. Gedeon,et al.  Static facial expression analysis in tough conditions: Data, evaluation protocol and benchmark , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[36]  Fei Chen,et al.  A Natural Visible and Infrared Facial Expression Database for Expression Recognition and Emotion Inference , 2010, IEEE Transactions on Multimedia.

[37]  Christopher Joseph Pal,et al.  EmoNets: Multimodal deep learning approaches for emotion recognition in video , 2015, Journal on Multimodal User Interfaces.

[38]  Stefano Berretti,et al.  Local 3D Shape Analysis for Facial Expression Recognition , 2010, 2010 20th International Conference on Pattern Recognition.

[39]  Mohammed Bennamoun,et al.  An Automatic Framework for Textured 3D Video-Based Facial Expression Recognition , 2014, IEEE Transactions on Affective Computing.

[40]  Qionghai Dai,et al.  A Data-Driven Approach for Facial Expression Retargeting in Video , 2014, IEEE Transactions on Multimedia.

[41]  Nicu Sebe,et al.  Learning Personalized Models for Facial Expression Analysis and Gesture Recognition , 2016, IEEE Transactions on Multimedia.

[42]  Matti Pietikäinen,et al.  Dynamic Texture Recognition Using Local Binary Patterns with an Application to Facial Expressions , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43]  Xiaoou Tang,et al.  Automatic facial expression recognition on a single 3D face by exploring shape deformation , 2009, ACM Multimedia.

[44]  Thomas S. Huang,et al.  3D facial expression recognition based on properties of line segments connecting facial feature points , 2008, 2008 8th IEEE International Conference on Automatic Face & Gesture Recognition.

[45]  Thomas S. Huang,et al.  3D facial expression recognition based on automatically selected features , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[46]  Alberto Del Bimbo,et al.  A Set of Selected SIFT Features for 3D Facial Expression Recognition , 2010, 2010 20th International Conference on Pattern Recognition.

[47]  Ioannis A. Kakadiaris,et al.  3D/4D facial expression analysis: An advanced annotated face model approach , 2012, Image Vis. Comput..

[48]  Liming Chen,et al.  Author manuscript, published in "Workshop 3D Face Biometrics, IEEE Automatic Facial and Gesture Recognition, Shanghai: China (2013)" Fully Automatic 3D Facial Expression Recognition using Differential Mean Curvature Maps and Histograms of Oriented Gradien , 2013 .

[49]  Soo-Young Lee,et al.  Hierarchical Committee of Deep CNNs with Exponentially-Weighted Decision Fusion for Static Facial Expression Recognition , 2015, ICMI.

[50]  Ping Liu,et al.  Facial Expression Recognition via a Boosted Deep Belief Network , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[51]  Marcus Liwicki,et al.  DeXpression: Deep Convolutional Neural Network for Expression Recognition , 2015, ArXiv.

[52]  Ying Li,et al.  Robust Symbolic Dual-View Facial Expression Recognition With Skin Wrinkles: Local Versus Global Approach , 2010, IEEE Transactions on Multimedia.

[53]  Sergio Escalera,et al.  Survey on RGB, 3D, Thermal, and Multimodal Approaches for Facial Expression Recognition: History, Trends, and Affect-Related Applications , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[54]  Lijun Yin,et al.  Static and dynamic 3D facial expression recognition: A comprehensive survey , 2012, Image Vis. Comput..

[55]  Andrew Zisserman,et al.  Return of the Devil in the Details: Delving Deep into Convolutional Nets , 2014, BMVC.

[56]  Zhengyou Zhang,et al.  Comparison between geometry-based and Gabor-wavelets-based facial expression recognition using multi-layer perceptron , 1998, Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition.

[57]  Arman Savran,et al.  Bosphorus Database for 3D Face Analysis , 2008, BIOID.

[58]  Mohan M. Trivedi,et al.  Face Expression Recognition by Cross Modal Data Association , 2013, IEEE Transactions on Multimedia.

[59]  Michael G. Strintzis,et al.  Bilinear Models for 3-D Face and Facial Expression Recognition , 2008, IEEE Transactions on Information Forensics and Security.

[60]  Liming Chen,et al.  3D facial expression recognition via multiple kernel learning of Multi-Scale Local Normal Patterns , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[61]  Liming Chen,et al.  Automatic 3D facial expression recognition using geometric scattering representation , 2015, 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[62]  Shiguang Shan,et al.  Deeply Learning Deformable Facial Action Parts Model for Dynamic Expression Analysis , 2014, ACCV.

[63]  Qingshan Liu,et al.  Learning Multiscale Active Facial Patches for Expression Analysis , 2015, IEEE Transactions on Cybernetics.

[64]  Ivor W. Tsang,et al.  Feature Disentangling Machine - A Novel Approach of Feature Selection and Disentangling in Facial Expression Analysis , 2014, ECCV.

[65]  Jun Wang,et al.  A 3D facial expression database for facial behavior research , 2006, 7th International Conference on Automatic Face and Gesture Recognition (FGR06).

[66]  H. Demirel,et al.  3D facial expression recognition with geometrically localized facial features , 2008, 2008 23rd International Symposium on Computer and Information Sciences.

[67]  Ioannis Pitas,et al.  Discriminant Graph Structures for Facial Expression Recognition , 2008, IEEE Transactions on Multimedia.

[68]  Liming Chen,et al.  3D Facial Expression Recognition Based on Histograms of Surface Differential Quantities , 2011, ACIVS.

[69]  Yichuan Tang,et al.  Deep Learning using Support Vector Machines , 2013, ArXiv.