Landmark-Aware and Part-based Ensemble Transfer Learning Network for Facial Expression Recognition from Static images

Facial Expression Recognition from static images is a challenging problem in computer vision applications. Convolutional Neural Network (CNN), the state-of-the-art method for various computer vision tasks, has had limited success in predicting expressions from faces having extreme poses, illumination, and occlusion conditions. To mitigate this issue, CNNs are often accompanied by techniques like transfer, multi-task, or ensemble learning that often provide high accuracy at the cost of high computational complexity. In this work, we propose a Part-based Ensemble Transfer Learning network, which models how humans recognize facial expressions by correlating the spatial orientation pattern of the facial features with a specific expression. It consists of 5 sub-networks, in which each sub-network performs transfer learning from one of the five subsets of facial landmarks: eyebrows, eyes, nose, mouth, or jaw to expression classification. We test the proposed network on the CK+, JAFFE, and SFEW datasets, and it outperforms the benchmark for CK+ and JAFFE datasets by 0.51% and 5.34%, respectively. Additionally, it consists of a total of 1.65M model parameters and requires only 3.28 × 106 FLOPS, which ensures computational efficiency for real-time deployment. Grad-CAM visualizations of our proposed ensemble highlight the complementary nature of its sub-networks, a key design parameter of an effective ensemble network. Lastly, cross-dataset evaluation results reveal that our proposed ensemble has a high generalization capacity. Our model trained on the SFEW Train dataset achieves an accuracy of 47.53% on the CK+ dataset, which is higher than what it achieves on the SFEW Valid dataset.

[1]  Cha Zhang,et al.  Image based Static Facial Expression Recognition with Multiple Deep Network Learning , 2015, ICMI.

[2]  P. Ekman,et al.  What the face reveals : basic and applied studies of spontaneous expression using the facial action coding system (FACS) , 2005 .

[3]  Tamás D. Gedeon,et al.  Collecting Large, Richly Annotated Facial-Expression Databases from Movies , 2012, IEEE MultiMedia.

[4]  Tal Hassner,et al.  Emotion Recognition in the Wild via Convolutional Neural Networks and Mapped Binary Patterns , 2015, ICMI.

[5]  David Masip,et al.  Multi-task, multi-label and multi-domain learning with residual convolutional networks for emotion recognition , 2018, ArXiv.

[6]  Abhishek Das,et al.  Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[7]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[8]  Soo-Young Lee,et al.  Hierarchical Committee of Deep CNNs with Exponentially-Weighted Decision Fusion for Static Facial Expression Recognition , 2015, ICMI.

[9]  Dong Liang,et al.  A facial expression recognition system based on supervised locally linear embedding , 2005, Pattern Recognit. Lett..

[10]  John D. Austin,et al.  Adaptive histogram equalization and its variations , 1987 .

[11]  Mohammad H. Mahoor,et al.  Going deeper in facial expression recognition using deep neural networks , 2015, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[12]  Jane You,et al.  Adaptive Deep Metric Learning for Identity-Aware Facial Expression Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[13]  H. Song,et al.  A Decade Survey of Transfer Learning (2010–2020) , 2020, IEEE Transactions on Artificial Intelligence.

[14]  Jian Sun,et al.  Face Alignment at 3000 FPS via Regressing Local Binary Features , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Simon Baker,et al.  Active Appearance Models Revisited , 2004, International Journal of Computer Vision.

[16]  Shiguang Shan,et al.  Learning Expressionlets on Spatio-temporal Manifold for Dynamic Facial Expression Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Ping Liu,et al.  Facial Expression Recognition via a Boosted Deep Belief Network , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[19]  Rama Chellappa,et al.  FaceNet2ExpNet: Regularizing a Deep Face Recognition Net for Expression Recognition , 2016, 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017).

[20]  Josephine Sullivan,et al.  One millisecond face alignment with an ensemble of regression trees , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Pawan Sinha,et al.  Qualitative Representations for Recognition , 2002, Biologically Motivated Computer Vision.

[22]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Yuxiao Hu,et al.  MS-Celeb-1M: A Dataset and Benchmark for Large-Scale Face Recognition , 2016, ECCV.

[24]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[25]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[26]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[27]  Tamás D. Gedeon,et al.  Static facial expression analysis in tough conditions: Data, evaluation protocol and benchmark , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[28]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[29]  Thomas S. Huang,et al.  Do Deep Neural Networks Learn Facial Action Units When Doing Expression Recognition? , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[30]  Timothy F. Cootes,et al.  Active Appearance Models , 1998, ECCV.

[31]  Stefan Winkler,et al.  Deep Learning for Emotion Recognition on Small Datasets using Transfer Learning , 2015, ICMI.

[32]  Jean Meunier,et al.  Prototype-Based Modeling for Facial Expression Analysis , 2014, IEEE Transactions on Multimedia.

[33]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[34]  Albert Ali Salah,et al.  Video-based emotion recognition in the wild using deep transfer learning and score fusion , 2017, Image Vis. Comput..

[35]  Tamás D. Gedeon,et al.  Video and Image based Emotion Recognition Challenges in the Wild: EmotiW 2015 , 2015, ICMI.

[36]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[37]  Junping Du,et al.  Reliable Crowdsourcing and Deep Locality-Preserving Learning for Expression Recognition in the Wild , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Yoshua Bengio,et al.  Challenges in representation learning: A report on three machine learning contests , 2013, Neural Networks.

[39]  Davis E. King,et al.  Dlib-ml: A Machine Learning Toolkit , 2009, J. Mach. Learn. Res..

[40]  Roland Göcke,et al.  Learning AAM fitting through simulation , 2009, Pattern Recognition.

[41]  David Masip,et al.  Supervised Committee of Convolutional Neural Networks in Automated Facial Expression Analysis , 2018, IEEE Transactions on Affective Computing.

[42]  Razvan Pascanu,et al.  Combining modality specific deep neural networks for emotion recognition in video , 2013, ICMI '13.

[43]  Shaogang Gong,et al.  Facial expression recognition based on Local Binary Patterns: A comprehensive study , 2009, Image Vis. Comput..

[44]  Stefanos Zafeiriou,et al.  300 Faces In-The-Wild Challenge: database and results , 2016, Image Vis. Comput..

[45]  Takeo Kanade,et al.  The Extended Cohn-Kanade Dataset (CK+): A complete dataset for action unit and emotion-specified expression , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[46]  Michael J. Lyons,et al.  Coding facial expressions with Gabor wavelets , 1998, Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition.

[47]  Anastasios Tefas,et al.  Salient feature and reliable classifier selection for facial expression classification , 2010, Pattern Recognit..

[48]  Graham W. Taylor,et al.  Multi-task Learning of Facial Landmarks and Expression , 2014, 2014 Canadian Conference on Computer and Robot Vision.

[49]  Fei Cheng,et al.  Facial Expression Recognition in JAFFE Dataset Based on Gaussian Process Classification , 2010, IEEE Transactions on Neural Networks.

[50]  Shengcai Liao,et al.  Learning Face Representation from Scratch , 2014, ArXiv.

[51]  Shiguang Shan,et al.  AU-inspired Deep Networks for Facial Expression Feature Learning , 2015, Neurocomputing.

[52]  Yong Du,et al.  Facial Expression Recognition Based on Deep Evolutional Spatial-Temporal Networks , 2017, IEEE Transactions on Image Processing.

[53]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[54]  Shiguang Shan,et al.  AU-aware Deep Networks for facial expression recognition , 2013, 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[55]  Zhiyuan Li,et al.  Island Loss for Learning Discriminative Features in Facial Expression Recognition , 2017, 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018).

[56]  Omkar M. Parkhi,et al.  VGGFace2: A Dataset for Recognising Faces across Pose and Age , 2017, 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018).

[57]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[58]  Abien Fred Agarap Deep Learning using Rectified Linear Units (ReLU) , 2018, ArXiv.

[59]  Karel J. Zuiderveld,et al.  Contrast Limited Adaptive Histogram Equalization , 1994, Graphics Gems.

[60]  Marwan Mattar,et al.  Labeled Faces in the Wild: A Database forStudying Face Recognition in Unconstrained Environments , 2008 .