Guided Spatial Transformers for Facial Expression Recognition

Spatial Transformer Networks are considered a powerful algorithm to learn the main areas of an image, but still, they could be more efficient by receiving images with embedded expert knowledge. This paper aims to improve the performance of conventional Spatial Transformers when applied to Facial Expression Recognition. Based on the Spatial Transformers’ capacity of spatial manipulation within networks, we propose different extensions to these models where effective attentional regions are captured employing facial landmarks or facial visual saliency maps. This specific attentional information is then hardcoded to guide the Spatial Transformers to learn the spatial transformations that best fit the proposed regions for better recognition results. For this study, we use two datasets: AffectNet and FER-2013. For AffectNet, we achieve a 0.35% point absolute improvement relative to the traditional Spatial Transformer, whereas for FER-2013, our solution gets an increase of 1.49% when models are fine-tuned with the Affectnet pre-trained weights.

[1]  Wei Jiang,et al.  STNReID: Deep Convolutional Networks With Pairwise Spatial Transformer Networks for Partial Person Re-Identification , 2019, IEEE Transactions on Multimedia.

[2]  W PicardRosalind,et al.  Driver Emotion Recognition for Intelligent Vehicles , 2020 .

[3]  S. R. Livingstone,et al.  The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English , 2018, PloS one.

[4]  Shervin Minaee,et al.  Deep-Emotion: Facial Expression Recognition Using Attentional Convolutional Network , 2019, Sensors.

[5]  J. Russell,et al.  The circumplex model of affect: An integrative approach to affective neuroscience, cognitive development, and psychopathology , 2005, Development and Psychopathology.

[6]  Tyler H. Shaw,et al.  From ‘automation’ to ‘autonomy’: the importance of trust repair in human–machine interaction , 2018, Ergonomics.

[7]  Davis E. King,et al.  Dlib-ml: A Machine Learning Toolkit , 2009, J. Mach. Learn. Res..

[8]  Didier Grandjean,et al.  Facial emotion recognition in Parkinson's disease: A review and new hypotheses , 2018, Movement disorders : official journal of the Movement Disorder Society.

[9]  Yu Qiao,et al.  Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks , 2016, IEEE Signal Processing Letters.

[10]  Binh T. Nguyen,et al.  An efficient real-time emotion detection using camera and facial landmarks , 2017, 2017 Seventh International Conference on Information Science and Technology (ICIST).

[11]  Ya Zhang,et al.  Spontaneous facial expression database for academic emotion inference in online learning , 2019, IET Comput. Vis..

[12]  Rada Mihalcea,et al.  MELD: A Multimodal Multi-Party Dataset for Emotion Recognition in Conversations , 2018, ACL.

[13]  Yoshua Bengio,et al.  Challenges in representation learning: A report on three machine learning contests , 2013, Neural Networks.

[14]  Rainer Goebel,et al.  Contextual Encoder-Decoder Network for Visual Saliency Prediction , 2019, Neural Networks.

[15]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[16]  Simon Lucey,et al.  Inverse Compositional Spatial Transformer Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[18]  Ben Glocker,et al.  Image-and-Spatial Transformer Networks for Structure-Guided Image Registration , 2019, MICCAI.

[19]  Ersin Yumer,et al.  ST-GAN: Spatial Transformer Generative Adversarial Networks for Image Compositing , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[20]  Mohammad H. Mahoor,et al.  AffectNet: A Database for Facial Expression, Valence, and Arousal Computing in the Wild , 2017, IEEE Transactions on Affective Computing.

[21]  Fabien Ringeval,et al.  Introducing the RECOLA multimodal corpus of remote collaborative and affective interactions , 2013, 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[22]  Carlos Busso,et al.  IEMOCAP: interactive emotional dyadic motion capture database , 2008, Lang. Resour. Evaluation.

[23]  VanderborghtBram,et al.  An Autonomous Cognitive Empathy Model Responsive to Users’ Facial Emotion Expressions , 2020 .

[24]  William-Chandra Tjhi,et al.  Dual Fuzzy-Possibilistic Co-clustering for Document Categorization , 2007 .

[25]  Cha Zhang,et al.  Image based Static Facial Expression Recognition with Multiple Deep Network Learning , 2015, ICMI.

[26]  Tomasz Trzcinski,et al.  Classifying and Visualizing Emotions with Emotional DAN , 2018, Fundam. Informaticae.

[27]  Qi Zhao,et al.  SALICON: Saliency in Context , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  T. Dalgleish Basic Emotions , 2004 .

[29]  Teng Gao,et al.  A new deep spatial transformer convolutional neural network for image saliency detection , 2018, Des. Autom. Embed. Syst..

[30]  A. Luebbe,et al.  An Emotion Recognition–Awareness Vulnerability Hypothesis for Depression in Adolescence: A Systematic Review , 2019, Clinical Child and Family Psychology Review.

[31]  Shilin Wang,et al.  Visual Speech Recognition in Natural Scenes Based on Spatial Transformer Networks , 2020, 2020 IEEE 14th International Conference on Anti-counterfeiting, Security, and Identification (ASID).

[32]  Shanmuganathan Raman,et al.  Facial Expression Recognition Using Visual Saliency and Deep Learning , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[33]  Adriana Kovashka,et al.  Attribute Adaptation for Personalized Image Search , 2013, 2013 IEEE International Conference on Computer Vision.