Unsupervised Learning Facial Parameter Regressor for Action Unit Intensity Estimation via Differentiable Renderer

Facial action unit (AU) intensity is an index to describe all visually discernible facial movements. Most existing methods learn intensity estimator with limited AU data, while they lack of generalization ability out of the dataset. In this paper, we present a framework to predict the facial parameters (including identity parameters and AU parameters) based on a bone-driven face model (BDFM) under different views. The proposed framework consists of a feature extractor, a generator, and a facial parameter regressor. The regressor can fit the physical meaning parameters of the BDFM from a single face image with the help of the generator, which maps the facial parameters to the game-face images as a differentiable renderer. Besides, identity loss, loopback loss, and adversarial loss can improve the regressive results. Quantitative evaluations are performed on two public databases BP4D and DISFA, which demonstrates that the proposed method can achieve comparable or better performance than the state-of-the-art methods. What's more, the qualitative results also demonstrate the validity of our method in the wild.

[1]  Mohammad H. Mahoor,et al.  Automatic detection of non-posed facial action units , 2012, 2012 19th IEEE International Conference on Image Processing.

[2]  Xiangyu Zhu,et al.  High-fidelity Pose and Expression Normalization for face recognition in the wild , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  H. Emrah Tasli,et al.  Deep learning based FACS Action Unit occurrence and intensity estimation , 2015, 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[4]  Daniel Cremers,et al.  Learning by Association — A Versatile Semi-Supervised Training Method for Neural Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Louis-Philippe Morency,et al.  Deep Constrained Local Models for Facial Landmark Detection , 2016, ArXiv.

[6]  Zhenwei Shi,et al.  Face-to-Parameter Translation for Game Character Auto-Creation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[7]  Maja Pantic,et al.  Latent trees for estimating intensity of Facial Action Units , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  P. Ekman,et al.  Facial action coding system: a technique for the measurement of facial movement , 1978 .

[9]  Mohammad H. Mahoor,et al.  Temporal Facial Expression Modeling for Automated Action Unit Intensity Measurement , 2014, 2014 22nd International Conference on Pattern Recognition.

[10]  Christian Theobalt,et al.  Reconstruction of Personalized 3D Face Rigs from Monocular Video , 2016, ACM Trans. Graph..

[11]  Shaun J. Canavan,et al.  BP4D-Spontaneous: a high-resolution spontaneous 3D dynamic facial expression database , 2014, Image Vis. Comput..

[12]  Maja Pantic,et al.  Web-based database for facial expression analysis , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[13]  Patrick Pérez,et al.  MoFA: Model-Based Deep Convolutional Face Autoencoder for Unsupervised Monocular Reconstruction , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[14]  Maja Pantic,et al.  Continuous Pain Intensity Estimation from Facial Expressions , 2012, ISVC.

[15]  Matthew Turk,et al.  A Morphable Model For The Synthesis Of 3D Faces , 1999, SIGGRAPH.

[16]  Baoyuan Wu,et al.  Joint Representation and Estimator Learning for Facial Action Unit Intensity Estimation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Vladimir Pavlovic,et al.  Context-Sensitive Dynamic Ordinal Regression for Intensity Estimation of Facial Action Units , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Baoyuan Wang,et al.  Joint Face Detection and Facial Motion Retargeting for Multiple Faces , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Xiangyu Zhu,et al.  Face Alignment in Full Pose Range: A 3D Total Solution , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Qiang Ji,et al.  Bilateral Ordinal Relevance Multi-instance Regression for Facial Action Unit Intensity Estimation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[21]  Zheng Zhang,et al.  FERA 2017 - Addressing Head Pose in the Third Facial Expression Recognition and Analysis Challenge , 2017, 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017).

[22]  Mohammad H. Mahoor,et al.  DISFA: A Spontaneous Facial Action Intensity Database , 2013, IEEE Transactions on Affective Computing.

[23]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[24]  Qiang Ji,et al.  Facial Expression Intensity Estimation Using Ordinal Information , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Kun Zhou,et al.  Displaced dynamic expression regression for real-time facial tracking and animation , 2014, ACM Trans. Graph..

[26]  Shaun J. Canavan,et al.  Multimodal Spontaneous Emotion Corpus for Human Behavior Analysis , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  William T. Freeman,et al.  Unsupervised Training for 3D Morphable Model Regression , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[29]  Ram Nevatia,et al.  Deep, Landmark-Free FAME: Face Alignment, Modeling, and Expression Estimation , 2019, International Journal of Computer Vision.

[30]  Juyong Zhang,et al.  Disentangled Representation Learning for 3D Face Shape , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Louis-Philippe Morency,et al.  Convolutional Experts Constrained Local Model for Facial Landmark Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[32]  Xing Ji,et al.  CosFace: Large Margin Cosine Loss for Deep Face Recognition , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[33]  Michael J. Black,et al.  Learning to Regress 3D Face Shape and Expression From an Image Without 3D Supervision , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Vladimir Pavlovic,et al.  Deep Structured Learning for Facial Action Unit Intensity Estimation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Tal Hassner,et al.  Regressing Robust and Discriminative 3D Morphable Models with a Very Deep Neural Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Gang Yu,et al.  BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation , 2018, ECCV.

[37]  Zunlei Feng,et al.  Unsupervised Facial Action Unit Intensity Estimation via Differentiable Optimization , 2020, ArXiv.

[38]  J. Fleiss,et al.  Intraclass correlations: uses in assessing rater reliability. , 1979, Psychological bulletin.

[39]  Tieniu Tan,et al.  A Light CNN for Deep Face Representation With Noisy Labels , 2015, IEEE Transactions on Information Forensics and Security.

[40]  Björn W. Schuller,et al.  DeepCoder: Semi-Parametric Variational Autoencoders for Automatic Facial Action Coding , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[41]  Takeo Kanade,et al.  The Extended Cohn-Kanade Dataset (CK+): A complete dataset for action unit and emotion-specified expression , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[42]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[43]  Yong Zhang,et al.  Context-Aware Feature and Label Fusion for Facial Action Unit Intensity Estimation With Partially Labeled Data , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[44]  Matan Sela,et al.  3D Face Reconstruction by Learning from Synthetic Data , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[45]  Timo Aila,et al.  A Style-Based Generator Architecture for Generative Adversarial Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[47]  M. Pantic,et al.  Induced Disgust , Happiness and Surprise : an Addition to the MMI Facial Expression Database , 2010 .

[48]  Maja Pantic,et al.  Doubly Sparse Relevance Vector Machine for Continuous Facial Behavior Estimation , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[49]  Fernando De la Torre,et al.  Continuous AU intensity estimation using localized, sparse facial feature space , 2013, 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[50]  Peter Robinson,et al.  Cross-dataset learning and person-specific normalisation for automatic Action Unit detection , 2015, 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[51]  Peter Robinson,et al.  Continuous Conditional Neural Fields for Structured Regression , 2014, ECCV.

[52]  Andrea Vedaldi,et al.  Instance Normalization: The Missing Ingredient for Fast Stylization , 2016, ArXiv.

[53]  M. Zollhöfer,et al.  Self-Supervised Multi-level Face Model Learning for Monocular Reconstruction at Over 250 Hz , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[54]  Stefanos Zafeiriou,et al.  Markov Random Field Structures for Facial Action Unit Intensity Estimation , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[55]  Zuheng Ming,et al.  Facial Action Units intensity estimation by the fusion of features with multi-kernel Support Vector Machine , 2015, 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[56]  Enhua Wu,et al.  Squeeze-and-Excitation Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[57]  Qiang Ji,et al.  Weakly-Supervised Deep Convolutional Neural Network Learning for Facial Action Unit Intensity Estimation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[58]  Sami Romdhani,et al.  A 3D Face Model for Pose and Illumination Invariant Face Recognition , 2009, 2009 Sixth IEEE International Conference on Advanced Video and Signal Based Surveillance.

[59]  Jesse Hoey,et al.  From individual to group-level emotion recognition: EmotiW 5.0 , 2017, ICMI.

[60]  Yiying Tong,et al.  FaceWarehouse: A 3D Facial Expression Database for Visual Computing , 2014, IEEE Transactions on Visualization and Computer Graphics.

[61]  Vladimir Pavlovic,et al.  Copula Ordinal Regression for Joint Estimation of Facial Action Unit Intensity , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[62]  Xiaoou Tang,et al.  Facial Landmark Detection by Deep Multi-task Learning , 2014, ECCV.

[63]  Qiong Cao,et al.  MMFace: A Multi-Metric Regression Network for Unconstrained Face Reconstruction , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).