Dynamic Facial Models for Video-Based Dimensional Affect Estimation

Dimensional affect estimation from a face video is a challenging task, mainly due to the large number of possible facial displays made up of a set of behaviour primitives including facial muscle actions. The displays vary not only in composition but also in temporal evolution, with each display composed of behaviour primitives with varying in their short and long-term characteristics. Most existing work models affect relies on complex hierarchical recurrent models unable to capture short-term dynamics well. In this paper, we propose to encode these short-term facial shape and appearance dynamics in an image, where only the semantic meaningful information is encoded into the dynamic face images. We also propose binary dynamic facial masks to remove 'stable pixels' from the dynamic images. This process allows filtering of non-dynamic information, i.e. only pixels that have changed in the sequence are retained. Then, the final proposed Dynamic Facial Model (DFM) encodes both filtered facial appearance and shape dynamics of a image sequence preceding to the given frame into a three-channel raster image. A CNN-RNN architecture is tasked with modelling primarily the long-term changes. Experiments show that our dynamic face images achieved superior performance over the standard RGB face images on dimensional affect prediction task.

[1]  Fabien Ringeval,et al.  Prediction-based learning for continuous emotion recognition in speech , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2]  Thomas S. Huang,et al.  How deep neural networks can improve emotion recognition on video data , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[3]  Mohammad H. Mahoor,et al.  Facial Affect Estimation in the Wild Using Deep Residual and Convolutional Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[4]  Fabien Ringeval,et al.  Continuous Estimation of Emotions in Speech by Dynamic Cooperative Speaker Models , 2017, IEEE Transactions on Affective Computing.

[5]  Paul Ekman,et al.  Emotional and Conversational Nonverbal Signals , 2004 .

[6]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[7]  Guoying Zhao,et al.  Deep Affect Prediction in-the-Wild: Aff-Wild Database and Challenge, Deep Architectures, and Beyond , 2018, International Journal of Computer Vision.

[8]  LinLin Shen,et al.  Human Behaviour-Based Automatic Depression Analysis Using Hand-Crafted Statistics and Deep Learned Spectral Features , 2018, 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018).

[9]  Maja Pantic,et al.  Doubly Sparse Relevance Vector Machine for Continuous Facial Behavior Estimation , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Mohammad H. Mahoor,et al.  AffectNet: A Database for Facial Expression, Valence, and Arousal Computing in the Wild , 2017, IEEE Transactions on Affective Computing.

[11]  Bo Sun,et al.  Exploring Multimodal Visual Features for Continuous Affect Recognition , 2016, AVEC@ACM Multimedia.

[12]  Fabien Ringeval,et al.  AVEC 2016: Depression, Mood, and Emotion Recognition Workshop and Challenge , 2016, AVEC@ACM Multimedia.

[13]  Björn W. Schuller,et al.  AVEC 2012: the continuous audio/visual emotion challenge , 2012, ICMI '12.

[14]  Maja Pantic,et al.  AFEW-VA database for valence and arousal estimation in-the-wild , 2017, Image Vis. Comput..

[15]  Günther Palm,et al.  Multiple classifier combination using reject options and markov fusion networks , 2012, ICMI '12.

[16]  N. Frijda THE EMOTIONS (STUDIES IN EMOTION AND SOCIAL INTERACTION) , 2011 .

[17]  Fabien Ringeval,et al.  AVEC 2018 Workshop and Challenge: Bipolar Disorder and Cross-Cultural Affect Recognition , 2018, AVEC@MM.

[18]  Vinod Chandran,et al.  Representation of facial expression categories in continuous arousal-valence space: Feature and correlation , 2014, Image Vis. Comput..

[19]  J. Russell A circumplex model of affect. , 1980 .

[20]  Alexandru Popescu,et al.  GAMYGDALA: An Emotion Engine for Games , 2014, IEEE Transactions on Affective Computing.

[21]  George Trigeorgis,et al.  End-to-End Multimodal Emotion Recognition Using Deep Neural Networks , 2017, IEEE Journal of Selected Topics in Signal Processing.

[22]  Michel F. Valstar,et al.  Automatic Behaviour Understanding in Medicine , 2014, RFMIR '14.

[23]  Pavel Matejka,et al.  Multimodal Emotion Recognition for AVEC 2016 Challenge , 2016, AVEC@ACM Multimedia.

[24]  Ninad Thakoor,et al.  Facial emotion recognition with expression energy , 2012, ICMI '12.

[25]  Mohammad Soleymani,et al.  AVEC 2019 Workshop and Challenge: State-of-Mind, Detecting Depression with AI, and Cross-Cultural Affect Recognition , 2019, AVEC@MM.

[26]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[27]  Shashank Jaiswal,et al.  Automatic prediction of Depression and Anxiety from behaviour and personality attributes , 2019, 2019 8th International Conference on Affective Computing and Intelligent Interaction (ACII).

[28]  Fernando De la Torre,et al.  A Functional Regression Approach to Facial Landmark Tracking , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Fabien Ringeval,et al.  AVEC 2017: Real-life Depression, and Affect Recognition Workshop and Challenge , 2017, AVEC@ACM Multimedia.

[30]  James W. Davis,et al.  The Recognition of Human Movement Using Temporal Templates , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[31]  Mohamed Chetouani,et al.  Robust continuous prediction of human emotions using multiscale dynamic cues , 2012, ICMI '12.

[32]  Björn W. Schuller,et al.  AVEC 2014: 3D Dimensional Affect and Depression Recognition Challenge , 2014, AVEC '14.

[33]  Björn W. Schuller,et al.  The Geneva Minimalistic Acoustic Parameter Set (GeMAPS) for Voice Research and Affective Computing , 2016, IEEE Transactions on Affective Computing.

[34]  Hatice Gunes,et al.  Continuous Prediction of Spontaneous Affect from Multiple Cues and Modalities in Valence-Arousal Space , 2011, IEEE Transactions on Affective Computing.

[35]  Bernhard Schölkopf,et al.  A tutorial on support vector regression , 2004, Stat. Comput..

[36]  Michel F. Valstar,et al.  Deep learning the dynamic appearance and shape of facial action units , 2016, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[37]  Maja Pantic,et al.  This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. IEEE TRANSACTIONS ON AFFECTIVE COMPUTING , 2022 .

[38]  Albert Ali Salah,et al.  Video-based emotion recognition in the wild using deep transfer learning and score fusion , 2017, Image Vis. Comput..

[39]  Enrique Argones-Rúa,et al.  Audiovisual three-level fusion for continuous estimation of Russell's emotion circumplex , 2013, AVEC@ACM Multimedia.

[40]  Andrea Vedaldi,et al.  Transactions on Pattern Analysis and Machine Intelligence 1 Action Recognition with Dynamic Image Networks , 2022 .

[41]  William M. Campbell,et al.  Multi-Modal Audio, Video and Physiological Sensor Learning for Continuous Emotion Prediction , 2016, AVEC@ACM Multimedia.

[42]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Björn Schuller,et al.  Strength modelling for real-worldautomatic continuous affect recognition from audiovisual signals , 2017, Image Vis. Comput..

[44]  LinLin Shen,et al.  Inferring Dynamic Representations of Facial Actions from a Still Image , 2019, ArXiv.

[45]  Fabien Ringeval,et al.  Introducing the RECOLA multimodal corpus of remote collaborative and affective interactions , 2013, 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[46]  Ting Dang,et al.  Staircase Regression in OA RVM, Data Selection and Gender Dependency in AVEC 2016 , 2016, AVEC@ACM Multimedia.

[47]  Arman Savran,et al.  Combining video, audio and lexical indicators of affect in spontaneous conversation via particle filtering , 2012, ICMI '12.