Expression recognition in the wild with transfer learning

Expression recognition in the wild is a challenging task because of the interference of various environment. In this paper, we propose a transfer-learning method that utilize two representative transformations from grayscale images as input and fuse their results in decision level to enhance the overall performance, which also address the dimensional mismatch issue when applying pre-trained deep neural network on grayscale images. Each network consists of the body of pre-trained deep network which is freezed during training process, and two other layers that trained for our expression classification task. Our method achieve an accuracy of 63.72%, a 11.18% improvement over baseline results on FER-2013 with fewer computation compared with other deep-learning methods, which demonstrate the complementary effect between two different inputs, as well as the good generalization ability of the features extracted by pre-trained network.

[1]  Junmo Kim,et al.  Joint Fine-Tuning in Deep Neural Networks for Facial Expression Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[2]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[3]  Michael J. Lyons,et al.  Automatic Classification of Single Facial Images , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Shiguang Shan,et al.  Learning Expressionlets on Spatio-temporal Manifold for Dynamic Facial Expression Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Shiliang Zhang,et al.  Multimodal Deep Convolutional Neural Network for Audio-Visual Emotion Recognition , 2016, ICMR.

[6]  Takeo Kanade,et al.  The Extended Cohn-Kanade Dataset (CK+): A complete dataset for action unit and emotion-specified expression , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[7]  Tamás D. Gedeon,et al.  Collecting Large, Richly Annotated Facial-Expression Databases from Movies , 2012, IEEE MultiMedia.

[8]  Fei Chen,et al.  A Natural Visible and Infrared Facial Expression Database for Expression Recognition and Emotion Inference , 2010, IEEE Transactions on Multimedia.

[9]  Tal Hassner,et al.  Emotion Recognition in the Wild via Convolutional Neural Networks and Mapped Binary Patterns , 2015, ICMI.

[10]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[11]  Yoshua Bengio,et al.  Challenges in representation learning: A report on three machine learning contests , 2013, Neural Networks.

[12]  Ya Li,et al.  CHEAVD: a Chinese natural emotional audio–visual database , 2016, Journal of Ambient Intelligence and Humanized Computing.

[13]  Shiguang Shan,et al.  Combining Multiple Kernel Methods on Riemannian Manifold for Emotion Recognition in the Wild , 2014, ICMI.

[14]  Anshul Sharma,et al.  A method to infer emotions from facial Action Units , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[15]  Yurong Chen,et al.  Capturing AU-Aware Facial Features and Their Latent Relations for Emotion Recognition in the Wild , 2015, ICMI.

[16]  Cha Zhang,et al.  Image based Static Facial Expression Recognition with Multiple Deep Network Learning , 2015, ICMI.

[17]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[18]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Shrikanth S. Narayanan,et al.  The Vera am Mittag German audio-visual emotional speech database , 2008, 2008 IEEE International Conference on Multimedia and Expo.

[21]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[22]  Rob Fergus,et al.  Stochastic Pooling for Regularization of Deep Convolutional Neural Networks , 2013, ICLR.

[23]  Tamás D. Gedeon,et al.  Static facial expression analysis in tough conditions: Data, evaluation protocol and benchmark , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[24]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[25]  Trevor Darrell,et al.  DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[26]  Matti Pietikäinen,et al.  Facial expression recognition from near-infrared videos , 2011, Image Vis. Comput..

[27]  Matti Pietikäinen,et al.  Dynamic Texture Recognition Using Local Binary Patterns with an Application to Facial Expressions , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Esa Rahtu,et al.  Volume Local Phase Quantization for Blur-Insensitive Dynamic Texture Classification , 2011, SCIA.

[29]  Hatice Gunes,et al.  A Bimodal Face and Body Gesture Database for Automatic Analysis of Human Nonverbal Affective Behavior , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[30]  M. Pantic,et al.  Induced Disgust , Happiness and Surprise : an Addition to the MMI Facial Expression Database , 2010 .

[31]  Stefan Winkler,et al.  Deep Learning for Emotion Recognition on Small Datasets using Transfer Learning , 2015, ICMI.

[32]  Xuewen Wu,et al.  Combining Multimodal Features within a Fusion Network for Emotion Recognition in the Wild , 2015, ICMI.