Learning deep facial expression features from image and optical flow sequences using 3D CNN

Facial expression is highly correlated with the facial motion. According to whether the temporal information of facial motion is used or not, the facial expression features can be classified as static and dynamic features. The former, which mainly includes the geometric features and appearance features, can be extracted by convolution or other learning filters; the latter, which are aimed to model the dynamic properties of facial motion, can be calculated through optical flow or other methods, respectively. When 3D convolutional neural networks (CNNs) are introduced, the extraction of two different types of features mentioned above becomes easy. In this paper, one 3D CNN architecture is presented to learn the static and dynamic features from facial image sequences and extract high-level dynamic features from optical flow sequences. Two types of dense optical flow, which contain the tracking information of facial muscle movement, are calculated according to different image pair construction methods. One is the common optical flow, and the other is an enhanced optical flow which is called accumulative optical flow. Four components of each type of optical flow are used in experiments. Three databases, two acted databases and one nearly realistic database, are selected to conduct the experiments. The experiments on the two acted databases achieve state-of-the-art accuracy, and indicate that the vertical component of optical flow has an advantage over other components in recognizing facial expression. The experimental results on the three selected databases show that more discriminative features can be learned from image sequences than from optical flow or accumulative optical flow sequences, and the accumulative optical flow contains more motion information than optical flow if the frame distance of the image pairs used to calculate them is not too large.

[1]  Yoshua Bengio,et al.  Algorithms for Hyper-Parameter Optimization , 2011, NIPS.

[2]  Ping Liu,et al.  Facial Expression Recognition via a Boosted Deep Belief Network , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Radhika M. Pai,et al.  Automatic Facial Expression Recognition Using DCNN , 2016 .

[4]  Tamás D. Gedeon,et al.  Video and Image based Emotion Recognition Challenges in the Wild: EmotiW 2015 , 2015, ICMI.

[5]  Stefan Wermter,et al.  Emotional expression recognition with a cross-channel convolutional neural network for human-robot interaction , 2015, 2015 IEEE-RAS 15th International Conference on Humanoid Robots (Humanoids).

[6]  Caiming Zhang,et al.  Dynamic 3D facial expression modeling using Laplacian smooth and multi-scale mesh matching , 2014, The Visual Computer.

[7]  Stefan Wermter,et al.  Developing crossmodal expression recognition based on a deep neural model , 2016, Adapt. Behav..

[8]  Beat Fasel,et al.  Automati Fa ial Expression Analysis: A Survey , 1999 .

[9]  Tardi Tjahjadi,et al.  A dynamic framework based on local Zernike moment and motion history image for facial expression recognition , 2017, Pattern Recognit..

[10]  Karen L. Schmidt,et al.  Human facial expressions as adaptations: Evolutionary questions in facial expression research. , 2001, American journal of physical anthropology.

[11]  Sven Behnke,et al.  Discovering hierarchical speech features using convolutional non-negative matrix factorization , 2003, Proceedings of the International Joint Conference on Neural Networks, 2003..

[12]  Kevin Leyton-Brown,et al.  Auto-WEKA: combined selection and hyperparameter optimization of classification algorithms , 2012, KDD.

[13]  Thomas S. Huang,et al.  How deep neural networks can improve emotion recognition on video data , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[14]  Ying Li,et al.  Robust Symbolic Dual-View Facial Expression Recognition With Skin Wrinkles: Local Versus Global Approach , 2010, IEEE Transactions on Multimedia.

[15]  Nadia Magnenat-Thalmann,et al.  Facial feature extraction for quick 3D face modeling , 2002, Signal Process. Image Commun..

[16]  Albert Ali Salah,et al.  Video-based emotion recognition in the wild using deep transfer learning and score fusion , 2017, Image Vis. Comput..

[17]  Takeo Kanade,et al.  The Extended Cohn-Kanade Dataset (CK+): A complete dataset for action unit and emotion-specified expression , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[18]  Sushmita Mitra,et al.  Facial Expressions: A Cross‐Cultural Study , 2015 .

[19]  Antonios Danelakis,et al.  A spatio-temporal wavelet-based descriptor for dynamic 3D facial expression retrieval and recognition , 2016, The Visual Computer.

[20]  Qiang Ji,et al.  Active and dynamic information fusion for facial expression understanding from image sequences , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Tal Hassner,et al.  Emotion Recognition in the Wild via Convolutional Neural Networks and Mapped Binary Patterns , 2015, ICMI.

[22]  Tamás D. Gedeon,et al.  Collecting Large, Richly Annotated Facial-Expression Databases from Movies , 2012, IEEE MultiMedia.

[23]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[24]  Jesse Hoey,et al.  EmotiW 2016: video and group-level emotion recognition challenges , 2016, ICMI.

[25]  Fernando De la Torre,et al.  Facial Expression Analysis , 2011, Visual Analysis of Humans.

[26]  Berthold K. P. Horn,et al.  Determining Optical Flow , 1981, Other Conferences.

[27]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[28]  C. Darwin The Expression of the Emotions in Man and Animals , .

[29]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[30]  Mohammad H. Mahoor,et al.  Going deeper in facial expression recognition using deep neural networks , 2015, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[31]  Philip J. B. Jackson,et al.  Speaker-dependent audio-visual emotion recognition , 2009, AVSP.

[32]  Anders Krogh,et al.  A Simple Weight Decay Can Improve Generalization , 1991, NIPS.

[33]  Soo-Young Lee,et al.  Hierarchical committee of deep convolutional neural networks for robust facial expression recognition , 2016, Journal on Multimodal User Interfaces.

[34]  Alberto Del Bimbo,et al.  Automatic facial expression recognition in real-time from dynamic sequences of 3D face scans , 2013, The Visual Computer.

[35]  Richard Bowden,et al.  Local binary patterns for multi-view facial expression recognition , 2011 .

[36]  J. Russell,et al.  The psychology of facial expression: Frontmatter , 1997 .

[37]  Takeo Kanade,et al.  Recognizing Action Units for Facial Expression Analysis , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[38]  John Loughrey,et al.  Using Early Stopping to Reduce Overfitting in Wrapper-Based Feature Weighting , 2005 .

[39]  Takeo Kanade,et al.  Comprehensive database for facial expression analysis , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[40]  Tamás D. Gedeon,et al.  Emotion Recognition In The Wild Challenge 2014: Baseline, Data and Protocol , 2014, ICMI.

[41]  Gunnar Farnebäck,et al.  Two-Frame Motion Estimation Based on Polynomial Expansion , 2003, SCIA.

[42]  R. Hetherington The Perception of the Visual World , 1952 .

[43]  Kevin Leyton-Brown,et al.  Sequential Model-Based Optimization for General Algorithm Configuration , 2011, LION.

[44]  Yong Man Ro,et al.  Collaborative expression representation using peak expression and intra class variation face images for practical subject-independent emotion recognition in videos , 2016, Pattern Recognit..

[45]  Rama Chellappa,et al.  FaceNet2ExpNet: Regularizing a Deep Face Recognition Net for Expression Recognition , 2016, 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017).

[46]  Aurobinda Routray,et al.  Robust facial expression classification using shape and appearance features , 2015, 2015 Eighth International Conference on Advances in Pattern Recognition (ICAPR).

[47]  David D. Cox,et al.  Hyperopt: A Python Library for Optimizing the Hyperparameters of Machine Learning Algorithms , 2013, SciPy.

[48]  Mansour Sheikhan,et al.  Audio-visual emotion recognition using FCBF feature selection method and particle swarm optimization for fuzzy ARTMAP neural networks , 2015, Multimedia Tools and Applications.

[49]  Dipti Prasad Mukherjee,et al.  Anubhav: recognizing emotions through facial expression , 2016, The Visual Computer.

[50]  John D. Fernandez,et al.  Facial feature detection using Haar classifiers , 2006 .

[51]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[52]  J. Russell,et al.  The psychology of facial expression: Foreword , 1997 .

[53]  M. Bartlett,et al.  Machine Analysis of Facial Expressions , 2007 .