Deep Learning for Spatio-Temporal Modeling of Dynamic Spontaneous Emotions

Facial expressions involve dynamic morphological changes in a face, conveying information about the expresser’s feelings. Each emotion has a specific spatial deformation over the face and temporal profile with distinct time segments. We aim at modeling the human dynamic emotional behavior by taking into consideration the visual content of the face and its evolution. But emotions can both speed-up or slow-down, therefore it is important to incorporate information from the local neighborhood frames (short-term dependencies) and the global setting (long-term dependencies) to summarize the segment context despite of its time variations. A 3D-Convolutional Neural Networks (3D-CNN) is used to learn early local spatiotemporal features. The 3D-CNN is designed to capture subtle spatiotemporal changes that may occur on the face. Then, a Convolutional-Long-Short-Term-Memory (ConvLSTM) network is designed to learn semantic information by taking into account longer spatiotemporal dependencies. The ConvLSTM network helps considering the global visual saliency of the expression. That is locating and learning features in space and time that stand out from their local neighbors in order to signify distinctive facial expression features along the entire sequence. Non-variant representations based on aggregating global spatiotemporal features at increasingly fine resolutions are then done using a weighted Spatial Pyramid Pooling layer.

[1]  P. Ekman Expression and the Nature of Emotion , 1984 .

[2]  Mubarak Shah,et al.  A 3-dimensional sift descriptor and its application to action recognition , 2007, ACM Multimedia.

[3]  Shiguang Shan,et al.  Deeply Learning Deformable Facial Action Parts Model for Dynamic Expression Analysis , 2014, ACCV.

[4]  R. Gunderman,et al.  Emotional intelligence. , 2011, Journal of the American College of Radiology : JACR.

[5]  Matti Pietikäinen,et al.  Dynamic Texture Recognition Using Local Binary Patterns with an Application to Facial Expressions , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[7]  Shaogang Gong,et al.  Facial expression recognition based on Local Binary Patterns: A comprehensive study , 2009, Image Vis. Comput..

[8]  Gwen Littlewort,et al.  Dynamics of Facial Expression Extracted Automatically from Video , 2004, CVPR Workshops.

[9]  Mohammad H. Mahoor,et al.  DISFA: A Spontaneous Facial Action Intensity Database , 2013, IEEE Transactions on Affective Computing.

[10]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[11]  Takeo Kanade,et al.  Comprehensive database for facial expression analysis , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[12]  Eduardo Sontag,et al.  Turing computability with neural nets , 1991 .

[13]  Subhransu Maji,et al.  Deep filter banks for texture recognition and segmentation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Shangfei Wang,et al.  Posed and spontaneous expression distinguishment from infrared thermal images , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[15]  Dit-Yan Yeung,et al.  Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting , 2015, NIPS.

[16]  Tobias Schreck,et al.  Histograms of Oriented Gradients for 3D Object Retrieval , 2010 .

[17]  Nadine Mandran,et al.  DynEmo: A video database of natural facial expressions of emotions. , 2013 .

[18]  Jürgen Schmidhuber,et al.  Learning Precise Timing with LSTM Recurrent Networks , 2003, J. Mach. Learn. Res..

[19]  Qiang Ji,et al.  Active and dynamic information fusion for facial expression understanding from image sequences , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Wilhelm Burger,et al.  Digital Image Processing - An Algorithmic Introduction using Java , 2008, Texts in Computer Science.

[21]  P. Ekman,et al.  What the face reveals : basic and applied studies of spontaneous expression using the facial action coding system (FACS) , 2005 .

[22]  Antonio Torralba,et al.  Context-based vision system for place and object recognition , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[23]  Mohammad H. Mahoor,et al.  Facial Expression Recognition Using Enhanced Deep 3D Convolutional Neural Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[24]  Philippos Mordohai,et al.  Automatic Facial Expression Recognition using Bags of Motion Words , 2010, BMVC.

[25]  Jiebo Luo,et al.  Recognizing realistic actions from videos “in the wild” , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  A. Fischer,et al.  Can perceivers recognise emotions from spontaneous expressions? , 2018, Cognition & emotion.

[27]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Xiaoping Chen,et al.  Analyses of the Differences between Posed and Spontaneous Facial Expressions , 2013, 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction.

[29]  Jian Sun,et al.  Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Richard Bowden,et al.  Facial Expression Recognition Using Spatiotemporal Boosted Discriminatory Classifiers , 2010, ICIAR.

[31]  P. Ekman Facial expression and emotion. , 1993, The American psychologist.

[32]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[33]  Rama Chellappa,et al.  Structure-Preserving Sparse Decomposition for Facial Expression Analysis , 2014, IEEE Transactions on Image Processing.

[34]  Jun Wan,et al.  Multi-Modality Fusion based on Consensus-Voting and 3D Convolution for Isolated Gesture Recognition , 2016, ArXiv.

[35]  Mohammad H. Mahoor,et al.  Going deeper in facial expression recognition using deep neural networks , 2015, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[36]  Ming Yang,et al.  3D Convolutional Neural Networks for Human Action Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[38]  Geoffrey E. Hinton,et al.  Layer Normalization , 2016, ArXiv.

[39]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[40]  E. Vesterinen,et al.  Affective Computing , 2009, Encyclopedia of Biometrics.

[41]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[42]  Gwen Littlewort,et al.  Dynamics of Facial Expression Extracted Automatically from Video , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[43]  Hod Lipson,et al.  Visually Debugging Restricted Boltzmann Machine Training with a 3D Example , 2012 .

[44]  César Ferri,et al.  Improving Performance of Multiclass Classification by Inducing Class Hierarchies , 2017, ICCS.

[45]  Qiang Ji,et al.  Differentiating Between Posed and Spontaneous Expressions with Latent Regression Bayesian Network , 2017, AAAI.

[46]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[47]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[48]  Lisa Gralewski,et al.  Using a tensor framework for the analysis of facial dynamics , 2006, 7th International Conference on Automatic Face and Gesture Recognition (FGR06).

[49]  Maja Pantic,et al.  Web-based database for facial expression analysis , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[50]  Sergio Escalera,et al.  Challenges in Multi-modal Gesture Recognition , 2017, Gesture Recognition.

[51]  Mohammad Reza Mohammadi,et al.  PCA-based dictionary building for accurate facial expression recognition via sparse representation , 2014, J. Vis. Commun. Image Represent..

[52]  Christopher Joseph Pal,et al.  EmoNets: Multimodal deep learning approaches for emotion recognition in video , 2015, Journal on Multimodal User Interfaces.

[53]  K. L. Gratz,et al.  Multidimensional Assessment of Emotion Regulation and Dysregulation: Development, Factor Structure, and Initial Validation of the Difficulties in Emotion Regulation Scale , 2004 .

[54]  Mohammad H. Mahoor,et al.  Facial expression recognition using lp\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${l}_{p}$$\end{document}-norm MKL , 2015, Machine Vision and Applications.

[55]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[56]  Karen L. Schmidt,et al.  Signal characteristics of spontaneous facial expressions: automatic movement in solitary and social smiles , 2003, Biological Psychology.

[57]  Alice Caplier,et al.  Spontaneous Facial Expression Recognition using Sparse Representation , 2017, VISIGRAPP 2017.

[58]  Maja Pantic,et al.  Machine analysis of facial behaviour: naturalistic and dynamic behaviour , 2009, Philosophical Transactions of the Royal Society B: Biological Sciences.

[59]  H. Kessler,et al.  Static and Dynamic Presentation of Emotions in Different Facial Areas: Fear and Surprise Show Influences of Temporal and Spatial Properties , 2013 .