Joint Facial Action Unit Intensity Prediction And Region Localisation

Facial Action Unit (AU) intensity prediction is essential to facial expression analysis and emotion recognition, and thus has attracted much attention from the community. In comparison, AU localization, albeit being important to emotion visualization and tracking, was relatively unexplored. In addition, as most existing AU intensity prediction methods take a cropped face image as input, their actual speed at run-time is often penalized by the pre-processing steps such as face detection and alignment. At the same time, their performance (in terms of inference speed), also does not scale well to multi-face images. To alleviate these problems, we propose a joint AU intensity prediction and localization method that works directly on the whole input image, thus eliminating the need of any pre-processing step and achieving the same inference speed regardless of the number of faces in the image. Based on the observation that different relevancy exists between AU intensities categories, a flexible cost function is proposed. At inference time, we introduce a non-maximum intensity suppression model to refine the prediction result. In order to leverage existing datasets without AU region groundtruth, we also propose an automatic AU region labeling method. Experiments on two benchmark databases, DISFA and FERA2015, show that the proposed approach outperforms the state-of-the-art methods on three metrics, ICC, MAE and F1 for the AU intensity prediction task.

[1]  Daniel S. Messinger,et al.  A framework for automated measurement of the intensity of non-posed Facial Action Units , 2009, 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[2]  Dongmei Jiang,et al.  Framework for combination aware AU intensity recognition , 2015, 2015 International Conference on Affective Computing and Intelligent Interaction (ACII).

[3]  Fernando De la Torre,et al.  Learning facial action units with spatiotemporal cues and multi-label sampling , 2019, Image Vis. Comput..

[4]  Zhigang Zhu,et al.  Action Unit Detection with Region Adaptation, Multi-labeling Learning and Optimal Temporal Fusing , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Maja Pantic,et al.  Latent trees for estimating intensity of Facial Action Units , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Zheng Zhang,et al.  FERA 2017 - Addressing Head Pose in the Third Facial Expression Recognition and Analysis Challenge , 2017, 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017).

[7]  Qiang Ji,et al.  Measuring the intensity of spontaneous facial action units with dynamic Bayesian network , 2015, Pattern Recognit..

[8]  Björn W. Schuller,et al.  DeepCoder: Semi-parametric Variational Autoencoders for Facial Action Unit Intensity Estimation , 2017, ArXiv.

[9]  Qiang Ji,et al.  Weakly-Supervised Deep Convolutional Neural Network Learning for Facial Action Unit Intensity Estimation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[10]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Di Huang,et al.  Local Binary Patterns and Its Application to Facial Image Analysis: A Survey , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[12]  Dezheng Zhang,et al.  A comprehensive survey on automatic facial action unit analysis , 2019, The Visual Computer.

[13]  Qiang Ji,et al.  Facial Action Unit Recognition and Intensity Estimation Enhanced Through Label Dependencies , 2019, IEEE Transactions on Image Processing.

[14]  Shaun J. Canavan,et al.  BP4D-Spontaneous: a high-resolution spontaneous 3D dynamic facial expression database , 2014, Image Vis. Comput..

[15]  Takeo Kanade,et al.  Automated facial expression recognition based on FACS action units , 1998, Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition.

[16]  Qiang Ji,et al.  Bilateral Ordinal Relevance Multi-instance Regression for Facial Action Unit Intensity Estimation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[17]  ByoungChul Ko,et al.  A Brief Review of Facial Emotion Recognition Based on Visual Information , 2018, Sensors.

[18]  Olga R. P. Bellon,et al.  AUMPNet: Simultaneous Action Units Detection and Intensity Estimation on Multipose Facial Images Using a Single Convolutional Neural Network , 2017, 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017).

[19]  Michel F. Valstar,et al.  Deep learning the dynamic appearance and shape of facial action units , 2016, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[20]  Mohammad H. Mahoor,et al.  Automatic detection of non-posed facial action units , 2012, 2012 19th IEEE International Conference on Image Processing.

[21]  Rainer Stiefelhagen,et al.  Action unit intensity estimation using hierarchical partial least squares , 2015, 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[22]  Vladimir Pavlovic,et al.  Copula Ordinal Regression for Joint Estimation of Facial Action Unit Intensity , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Li Zhang,et al.  Adaptive 3D facial action intensity estimation and emotion recognition , 2015, Expert Syst. Appl..

[24]  Vladimir Pavlovic,et al.  Deep Structured Learning for Facial Action Unit Intensity Estimation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Mohammad H. Mahoor,et al.  Extended DISFA Dataset: Investigating Posed and Spontaneous Facial Expressions , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[26]  Honggang Zhang,et al.  Joint Patch and Multi-label Learning for Facial Action Unit and Holistic Expression Recognition , 2016, IEEE Transactions on Image Processing.

[27]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Lijun Yin,et al.  FERA 2015 - second Facial Expression Recognition and Analysis challenge , 2015, 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[29]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Ruimin Shen,et al.  Region and Temporal Dependency Fusion for Multi-label Action Unit Detection , 2018, 2018 24th International Conference on Pattern Recognition (ICPR).

[31]  Bertram E. Shi,et al.  Pose-Independent Facial Action Unit Intensity Regression Based on Multi-Task Deep Transfer Learning , 2017, 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017).

[32]  Honggang Zhang,et al.  Deep Region and Multi-label Learning for Facial Action Unit Detection , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Markus Kächele,et al.  Support Vector Regression of Sparse Dictionary-Based Features for View-Independent Action Unit Intensity Estimation , 2017, 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017).

[34]  Vladimir Pavlovic,et al.  Context-Sensitive Dynamic Ordinal Regression for Intensity Estimation of Facial Action Units , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Qiang Ji,et al.  A unified probabilistic framework for measuring the intensity of spontaneous facial action units , 2013, 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).