Facial action recognition using very deep networks for highly imbalanced class distribution

Positive samples of facial actions are much fewer than negative samples in natural conditions. The highly imbalanced class-distributions may cause very slow rate of convergence of error when using neural networks for facial action recognition. Traditional methods tackle this class-imbalance problem by changing data distributions, which is challenging for preventing the loss of useful information. In this paper we tackle this problem by using very deep (>10 layers) architectures to increase the chance that network training has acceptable rate of convergence using highly imbalanced data sets. Experimental results on EmotioNet Challenge data set show that the error rates of very deep covolutional networks converge to 40% after 90 epochs while shallower networks only converge to 60%. The results also show that very deep network outperforms shallower network by 0.2 on accuracy score. The proposed neural networks won the first place of the first track in the automatic detection of action units (AUs) of EmotioNet Challenge.

[1]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[2]  Ping Hu,et al.  HoloNet: towards robust emotion recognition in the wild , 2016, ICMI.

[3]  Yan Wang,et al.  EmotioNet Challenge: Recognition of facial expressions of emotion in the wild , 2017, ArXiv.

[4]  Yuanliu Liu,et al.  Video-based emotion recognition using CNN-RNN and C3D hybrid networks , 2016, ICMI.

[5]  Chen Huang,et al.  Learning Deep Representation for Imbalanced Classification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Igor Kononenko,et al.  Cost-Sensitive Learning with Neural Networks , 1998, ECAI.

[7]  Andrea Cavallaro,et al.  Automatic Analysis of Facial Affect: A Survey of Registration, Representation, and Recognition , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Haizhou Li,et al.  Audio and face video emotion recognition in the wild using deep neural networks and small datasets , 2016, ICMI.

[9]  Yann LeCun,et al.  The Loss Surfaces of Multilayer Networks , 2014, AISTATS.

[10]  Stefan Winkler,et al.  Deep Learning for Emotion Recognition on Small Datasets using Transfer Learning , 2015, ICMI.

[11]  Razvan Pascanu,et al.  Local minima in training of neural networks , 2016, 1611.06310.

[12]  H. Emrah Tasli,et al.  Deep learning based FACS Action Unit occurrence and intensity estimation , 2015, 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[13]  Zheng Zhang,et al.  FERA 2017 - Addressing Head Pose in the Third Facial Expression Recognition and Analysis Challenge , 2017, 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017).

[14]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Kenji Kawaguchi,et al.  Deep Learning without Poor Local Minima , 2016, NIPS.

[16]  Ivan Laptev,et al.  Learning and Transferring Mid-level Image Representations Using Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Jesse Hoey,et al.  EmotiW 2016: video and group-level emotion recognition challenges , 2016, ICMI.

[18]  Peter Robinson,et al.  Cross-dataset learning and person-specific normalisation for automatic Action Unit detection , 2015, 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[19]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[20]  Kishan G. Mehrotra,et al.  An improved algorithm for neural network classification of imbalanced training sets , 1993, IEEE Trans. Neural Networks.

[21]  Peter Robinson,et al.  Constrained Local Neural Fields for Robust Facial Landmark Detection in the Wild , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[22]  Honggang Zhang,et al.  Joint patch and multi-label learning for facial action unit detection , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  P. Ekman,et al.  Facial action coding system: a technique for the measurement of facial movement , 1978 .

[24]  Zhengyou Zhang,et al.  Feature-Based Facial Expression Recognition: Sensitivity Analysis and Experiments with A Multilayer Perceptron , 1999, Int. J. Pattern Recognit. Artif. Intell..

[25]  Emad Barsoum,et al.  Emotion recognition in the wild from videos using images , 2016, ICMI.

[26]  Matus Telgarsky,et al.  Benefits of Depth in Neural Networks , 2016, COLT.

[27]  Yuting Zhang,et al.  Learning to Disentangle Factors of Variation with Manifold Interaction , 2014, ICML.

[28]  Gregory D. Hager,et al.  Regularizing face verification nets for pain intensity regression , 2017, 2017 IEEE International Conference on Image Processing (ICIP).

[29]  Aleix M. Martínez,et al.  EmotioNet: An Accurate, Real-Time Algorithm for the Automatic Annotation of a Million Facial Expressions in the Wild , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).