Pose-Independent Facial Action Unit Intensity Regression Based on Multi-Task Deep Transfer Learning

Facial expression recognition plays an increasingly important role in human behavior analysis and human computer interaction. Facial action units (AUs) coded by the Facial Action Coding System (FACS) provide rich cues for the interpretation of facial expressions. Much past work on AU analysis used only frontal view images, but natural images contain a much wider variety of poses. The FG 2017 Facial Expression Recognition and Analysis challenge (FERA 2017) requires participants to estimate the AU occurrence and intensity under nine different pose angles. This paper proposes a multi-task deep network addressing the AU intensity estimation sub-challenge of FERA 2017. The network performs the tasks of pose estimation and pose-dependent AU intensity estimation simultaneously. It merges the pose-dependent AU intensity estimates into a single estimate using the estimated pose. The two tasks share transferred bottom layers of a deep convolutional neural network (CNN) pre-trained on ImageNet. Our model outperforms the baseline results, and achieves a balanced performance among nine pose angles for most AUs.

[1]  Michel F. Valstar,et al.  Deep learning the dynamic appearance and shape of facial action units , 2016, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[2]  Stuart J. Russell,et al.  Dynamic bayesian networks: representation, inference and learning , 2002 .

[3]  P. Ekman,et al.  Facial action coding system , 2019 .

[4]  Ramakant Nevatia,et al.  Face recognition using deep multi-pose representations , 2016, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[5]  Sebastian Kaltwang,et al.  Regression-based estimation of pain and facial expression intensity , 2015 .

[6]  R. Gur,et al.  Automated Facial Action Coding System for dynamic analysis of facial expressions in neuropsychiatric disorders , 2011, Journal of Neuroscience Methods.

[7]  Lijun Yin,et al.  FERA 2015 - second Facial Expression Recognition and Analysis challenge , 2015, 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[8]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[9]  Bertram E. Shi,et al.  Action unit selective feature maps in deep networks for facial expression recognition , 2017, 2017 International Joint Conference on Neural Networks (IJCNN).

[10]  Shaun J. Canavan,et al.  Multimodal Spontaneous Emotion Corpus for Human Behavior Analysis , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Zheng Zhang,et al.  FERA 2017 - Addressing Head Pose in the Third Facial Expression Recognition and Analysis Challenge , 2017, 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017).

[12]  Maja Pantic,et al.  This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. IEEE TRANSACTIONS ON AFFECTIVE COMPUTING , 2022 .

[13]  Mohammad H. Mahoor,et al.  DISFA: A Spontaneous Facial Action Intensity Database , 2013, IEEE Transactions on Affective Computing.

[14]  Jeffrey F. Cohn,et al.  Painful data: The UNBC-McMaster shoulder pain expression archive database , 2011, Face and Gesture 2011.

[15]  William Curran,et al.  Gender Differences in the Perceptions of Genuine and Simulated Laughter and Amused Facial Expressions , 2015 .

[16]  M. Pantic,et al.  Induced Disgust , Happiness and Surprise : an Addition to the MMI Facial Expression Database , 2010 .

[17]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[18]  Fernando De la Torre,et al.  Continuous AU intensity estimation using localized, sparse facial feature space , 2013, 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[19]  András Lörincz,et al.  Deep Learning for Facial Action Unit Detection Under Large Head Poses , 2016, ECCV Workshops.

[20]  Yoshua Bengio,et al.  How transferable are features in deep neural networks? , 2014, NIPS.

[21]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[22]  Shaun J. Canavan,et al.  BP4D-Spontaneous: a high-resolution spontaneous 3D dynamic facial expression database , 2014, Image Vis. Comput..

[23]  H. Emrah Tasli,et al.  Deep learning based FACS Action Unit occurrence and intensity estimation , 2015, 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[24]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[25]  Patrick O. Glauner Deep Convolutional Neural Networks for Smile Recognition , 2015, ArXiv.

[26]  Takeo Kanade,et al.  The Extended Cohn-Kanade Dataset (CK+): A complete dataset for action unit and emotion-specified expression , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[27]  Jeffrey F. Cohn,et al.  Automatic detection of pain intensity , 2012, ICMI '12.

[28]  Li Zhang,et al.  Adaptive 3D facial action intensity estimation and emotion recognition , 2015, Expert Syst. Appl..

[29]  Arman Savran,et al.  Regression-based intensity estimation of facial action units , 2012, Image Vis. Comput..

[30]  Fernando De la Torre,et al.  Detecting depression from facial actions and vocal prosody , 2009, 2009 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops.

[31]  Mohammad H. Mahoor,et al.  Temporal Facial Expression Modeling for Automated Action Unit Intensity Measurement , 2014, 2014 22nd International Conference on Pattern Recognition.

[32]  Stefanos Zafeiriou,et al.  Markov Random Field Structures for Facial Action Unit Intensity Estimation , 2013, 2013 IEEE International Conference on Computer Vision Workshops.