Edge Convolutional Network for Facial Action Intensity Estimation

In this paper, we propose a novel convolutional neural architecture for facial action unit intensity estimation. While Convolutional Neural Networks (CNNs) have shown great promise in a wide range of computer vision tasks, these achievements have not translated as well to facial expression analysis, with hand crafted features (e.g. the Histogram of Orientated Gradient) still being very competitive. We introduce a novel Edge Convolutional Network (ECN) that is able to capture subtle changes in facial appearance. Our model is able to learn edge-like detectors that can capture subtle wrinkles and facial muscle contours at multiple orientations and frequencies. The core novelty of our ECN model is in its first layer which integrates three main components: an edge filter generator, a receptive gate and a filter rotator. All the components are differentiable and our ECN model is end-to-end trainable and learns the important edge detectors for facial expression analysis. Experiments on two facial action unit datasets show that the proposed ECN outperforms state-of-the-art methods for both AU intensity estimation tasks.

[1]  Mario Fritz,et al.  See the Difference: Direct Pre-Image Reconstruction and Pose Estimation by Differentiating HOG , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[2]  Xiaogang Wang,et al.  Learnable Histogram: Statistical Context Features for Deep Neural Networks , 2016, ECCV.

[3]  Pascal Vincent,et al.  Disentangling Factors of Variation for Facial Expression Recognition , 2012, ECCV.

[4]  Ping Liu,et al.  Facial Expression Recognition via a Boosted Deep Belief Network , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Mohammad H. Mahoor,et al.  DISFA: A Spontaneous Facial Action Intensity Database , 2013, IEEE Transactions on Affective Computing.

[7]  Qiang Ji,et al.  Expression-assisted facial action unit recognition under incomplete AU annotation , 2017, Pattern Recognit..

[8]  Maja Pantic,et al.  Latent trees for estimating intensity of Facial Action Units , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Vladimir Pavlovic,et al.  Context-Sensitive Dynamic Ordinal Regression for Intensity Estimation of Facial Action Units , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Andrea Vedaldi,et al.  Understanding deep image representations by inverting them , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Peter Robinson,et al.  Cross-dataset learning and person-specific normalisation for automatic Action Unit detection , 2015, 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[12]  Peter Robinson,et al.  Continuous Conditional Neural Fields for Structured Regression , 2014, ECCV.

[13]  Max Welling,et al.  Group Equivariant Convolutional Networks , 2016, ICML.

[14]  Fernando De la Torre,et al.  Facial Expression Analysis , 2011, Visual Analysis of Humans.

[15]  Peter Robinson,et al.  OpenFace: An open source facial behavior analysis toolkit , 2016, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[16]  Andrew Zisserman,et al.  Spatial Transformer Networks , 2015, NIPS.

[17]  Matthew D. Zeiler ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[18]  Shaun J. Canavan,et al.  BP4D-Spontaneous: a high-resolution spontaneous 3D dynamic facial expression database , 2014, Image Vis. Comput..

[19]  Lijun Yin,et al.  FERA 2015 - second Facial Expression Recognition and Analysis challenge , 2015, 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[20]  Honggang Zhang,et al.  Deep Region and Multi-label Learning for Facial Action Unit Detection , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[22]  Mohamed Chetouani,et al.  Real-time facial action unit intensity prediction with regularized metric learning , 2016, Image Vis. Comput..

[23]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[24]  Andrea Cavallaro,et al.  Automatic Analysis of Facial Affect: A Survey of Registration, Representation, and Recognition , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  H. Emrah Tasli,et al.  Deep learning based FACS Action Unit occurrence and intensity estimation , 2015, 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[26]  Hongbin Zha,et al.  Multi-view common space learning for emotion recognition in the wild , 2016, ICMI.