FaceNet2ExpNet: Regularizing a Deep Face Recognition Net for Expression Recognition

Relatively small data sets available for expression recognition research make the training of deep networks very challenging. Although fine-tuning can partially alleviate the issue, the performance is still below acceptable levels as the deep features probably contain redundant information from the pretrained domain. In this paper, we present FaceNet2ExpNet, a novel idea to train an expression recognition network based on static images. We first propose a new distribution function to model the high-level neurons of the expression network. Based on this, a two-stage training algorithm is carefully designed. In the pre-training stage, we train the convolutional layers of the expression net, regularized by the face net; In the refining stage, we append fully-connected layers to the pre-trained convolutional layers and train the whole network jointly. Visualization results show that the model trained with our method captures improved high-level expression semantics. Evaluations on four public expression databases, CK+, Oulu- CASIA, TFD, and SFEW demonstrate that our method achieves better results than state-of-the-art.

[1]  Dumitru Erhan,et al.  Training Deep Neural Networks on Noisy Labels with Bootstrapping , 2014, ICLR.

[2]  Shiguang Shan,et al.  Deeply Learning Deformable Facial Action Parts Model for Dynamic Expression Analysis , 2014, ACCV.

[3]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[4]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Ivor W. Tsang,et al.  Feature Disentangling Machine - A Novel Approach of Feature Selection and Disentangling in Facial Expression Analysis , 2014, ECCV.

[6]  Soo-Young Lee,et al.  Hierarchical committee of deep convolutional neural networks for robust facial expression recognition , 2016, Journal on Multimodal User Interfaces.

[7]  Saeed Meshgini,et al.  Facial expression recognition based on Local Binary Patterns , 2016 .

[8]  Bolei Zhou,et al.  Learning Deep Features for Scene Recognition using Places Database , 2014, NIPS.

[9]  Ira Kemelmacher-Shlizerman,et al.  The MegaFace Benchmark: 1 Million Faces for Recognition at Scale , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Shengcai Liao,et al.  Learning Face Representation from Scratch , 2014, ArXiv.

[11]  Andrew Zisserman,et al.  Return of the Devil in the Details: Delving Deep into Convolutional Nets , 2014, BMVC.

[12]  Tal Hassner,et al.  Emotion Recognition in the Wild via Convolutional Neural Networks and Mapped Binary Patterns , 2015, ICMI.

[13]  Ping Liu,et al.  Facial Expression Recognition via a Boosted Deep Belief Network , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Matti Pietikäinen,et al.  Facial expression recognition from near-infrared videos , 2011, Image Vis. Comput..

[15]  G. Cottrell,et al.  EMPATH: A Neural Network that Categorizes Facial Expressions , 2002, Journal of Cognitive Neuroscience.

[16]  Gwen Littlewort,et al.  Recognizing facial expression: machine learning and application to spontaneous behavior , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[17]  Shuicheng Yan,et al.  Peak-Piloted Deep Network for Facial Expression Recognition , 2016, ECCV.

[18]  Xiaoou Tang,et al.  A large-scale car dataset for fine-grained categorization and verification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Takayuki Okatani,et al.  Automatic Attribute Discovery with Neural Activations , 2016, ECCV.

[20]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[21]  Shiguang Shan,et al.  Learning Expressionlets on Spatio-temporal Manifold for Dynamic Facial Expression Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Shiguang Shan,et al.  AU-aware Deep Networks for facial expression recognition , 2013, 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[23]  Qingshan Liu,et al.  Learning active facial patches for expression analysis , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Takeo Kanade,et al.  The Extended Cohn-Kanade Dataset (CK+): A complete dataset for action unit and emotion-specified expression , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[25]  Pascal Vincent,et al.  Disentangling Factors of Variation for Facial Expression Recognition , 2012, ECCV.

[26]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[27]  Yoshua Bengio,et al.  Challenges in representation learning: A report on three machine learning contests , 2013, Neural Networks.

[28]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[29]  Matti Pietikäinen,et al.  Dynamic Facial Expression Recognition Using Longitudinal Facial Expression Atlases , 2012, ECCV.

[30]  Stefan Winkler,et al.  Deep Learning for Emotion Recognition on Small Datasets using Transfer Learning , 2015, ICMI.

[31]  Junmo Kim,et al.  Deep Temporal Appearance-Geometry Network for Facial Expression Recognition , 2015, ArXiv.

[32]  Mohammad H. Mahoor,et al.  Going deeper in facial expression recognition using deep neural networks , 2015, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[33]  Andrew Zisserman,et al.  Deep Face Recognition , 2015, BMVC.

[34]  Gaurav Sharma,et al.  LOMo: Latent Ordinal Model for Facial Analysis in Videos , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Cha Zhang,et al.  Image based Static Facial Expression Recognition with Multiple Deep Network Learning , 2015, ICMI.

[36]  Yuting Zhang,et al.  Learning to Disentangle Factors of Variation with Manifold Interaction , 2014, ICML.

[37]  Qi Tian,et al.  InterActive: Inter-Layer Activeness Propagation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[39]  Geoffrey E. Hinton,et al.  On deep generative models with applications to recognition , 2011, CVPR 2011.

[40]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[41]  Tamás D. Gedeon,et al.  Video and Image based Emotion Recognition Challenges in the Wild: EmotiW 2015 , 2015, ICMI.

[42]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[43]  Yoshua Bengio,et al.  FitNets: Hints for Thin Deep Nets , 2014, ICLR.

[44]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[45]  Cordelia Schmid,et al.  A Spatio-Temporal Descriptor Based on 3D-Gradients , 2008, BMVC.