Time-Delay Neural Network for Continuous Emotional Dimension Prediction From Facial Expression Sequences

Automatic continuous affective state prediction from naturalistic facial expression is a very challenging research topic but very important in human-computer interaction. One of the main challenges is modeling the dynamics that characterize naturalistic expressions. In this paper, a novel two-stage automatic system is proposed to continuously predict affective dimension values from facial expression videos. In the first stage, traditional regression methods are used to classify each individual video frame, while in the second stage, a time-delay neural network (TDNN) is proposed to model the temporal relationships between consecutive predictions. The two-stage approach separates the emotional state dynamics modeling from an individual emotional state prediction step based on input features. In doing so, the temporal information used by the TDNN is not biased by the high variability between features of consecutive frames and allows the network to more easily exploit the slow changing dynamics between emotional states. The system was fully tested and evaluated on three different facial expression video datasets. Our experimental results demonstrate that the use of a two-stage approach combined with the TDNN to take into account previously classified frames significantly improves the overall performance of continuous emotional state estimation in naturalistic facial expressions. The proposed approach has won the affect recognition sub-challenge of the Third International Audio/Visual Emotion Recognition Challenge1.

[1]  J. Russell A circumplex model of affect. , 1980 .

[2]  Anton Nijholt,et al.  Emotional brain-computer interfaces , 2009, 2009 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops.

[3]  Andrea Kleinsmith,et al.  Grounding Affective Dimensions into Posture Features , 2005, ACII.

[4]  Ling Shao,et al.  Video abstraction based on fMRI-driven visual attention model , 2014, Inf. Sci..

[5]  Heng Wang,et al.  Depression recognition based on dynamic facial and vocal expression features using partial least square regression , 2013, AVEC@ACM Multimedia.

[6]  Björn W. Schuller,et al.  Abandoning emotion classes - towards continuous emotion recognition with modelling of long-range dependencies , 2008, INTERSPEECH.

[7]  Stefanos Zafeiriou,et al.  Audiovisual classification of vocal outbursts in human conversation using Long-Short-Term Memory networks , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[8]  Alexander Laszlo,et al.  Technology and Values: Interactive Learning Environments for Future Generations. , 1995 .

[9]  Ana Tajadura-Jiménez,et al.  Motivating people with chronic pain to do physical activity: opportunities for technology design , 2014, CHI.

[10]  Maja Pantic,et al.  Web-based database for facial expression analysis , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[11]  Björn W. Schuller,et al.  AVEC 2011-The First International Audio/Visual Emotion Challenge , 2011, ACII.

[12]  Nadia Bianchi-Berthouze,et al.  The brain’s response to pleasant touch: an EEG investigation of tactile caressing , 2014, Front. Hum. Neurosci..

[13]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[14]  Yuan Gao,et al.  What Does Touch Tell Us about Emotions in Touchscreen-Based Gameplay? , 2012, TCHI.

[15]  Shaun J. Canavan,et al.  BP4D-Spontaneous: a high-resolution spontaneous 3D dynamic facial expression database , 2014, Image Vis. Comput..

[16]  Angeliki Metallinou,et al.  Annotation and processing of continuous emotional attributes: Challenges and opportunities , 2013, 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[17]  Ville Ojansivu,et al.  Blur Insensitive Texture Classification Using Local Phase Quantization , 2008, ICISP.

[18]  William T. Freeman,et al.  Orientation Histograms for Hand Gesture Recognition , 1995 .

[19]  Larry S. Davis,et al.  Fast multiple object tracking via a hierarchical particle filter , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[20]  Nadia Bianchi-Berthouze,et al.  Emotion recognition by two view SVM_2K classifier on dynamic facial expression features , 2011, Face and Gesture 2011.

[21]  Andrea Kleinsmith,et al.  Cross-cultural differences in recognizing affect from body posture , 2006, Interact. Comput..

[22]  Qiang Ji,et al.  A Unified Probabilistic Framework for Spontaneous Facial Action Modeling and Understanding , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Ling Shao,et al.  Spatio-Temporal Laplacian Pyramid Coding for Action Recognition , 2014, IEEE Transactions on Cybernetics.

[24]  Alexander J. Smola,et al.  Support Vector Regression Machines , 1996, NIPS.

[25]  Fakhri Karray,et al.  Survey on speech emotion recognition: Features, classification schemes, and databases , 2011, Pattern Recognit..

[26]  Günther Palm,et al.  The GMM-SVM Supervector Approach for the Recognition of the Emotional Status from Speech , 2009, ICANN.

[27]  Christoph Bregler,et al.  Learning and recognizing human dynamics in video sequences , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[28]  Andrea Cavallaro,et al.  Probabilistic Subpixel Temporal Registration for Facial Expression Analysis , 2014, ACCV.

[29]  Andrea Cavallaro,et al.  Automatic Analysis of Facial Affect: A Survey of Registration, Representation, and Recognition , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Anton Nijholt,et al.  Affective Brain-Computer Interfaces for Arts , 2013, 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction.

[31]  Nadia Bianchi-Berthouze,et al.  Does Movement Recognition Precision Affect the Player Experience in Exertion Games? , 2011, INTETAIN.

[32]  Katherine B. Martin,et al.  Facial Action Coding System , 2015 .

[33]  Nadia Bianchi-Berthouze,et al.  Continuous Recognition of Player's Affective Body Expression as Dynamic Quality of Aesthetic Experience , 2012, IEEE Transactions on Computational Intelligence and AI in Games.

[34]  Takeo Kanade,et al.  Comprehensive database for facial expression analysis , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[35]  Enrique Argones-Rúa,et al.  Audiovisual three-level fusion for continuous estimation of Russell's emotion circumplex , 2013, AVEC@ACM Multimedia.

[36]  Björn W. Schuller,et al.  Categorical and dimensional affect analysis in continuous input: Current trends and future directions , 2013, Image Vis. Comput..

[37]  Michael Wagner,et al.  Multimodal assistive technologies for depression diagnosis and monitoring , 2013, Journal on Multimodal User Interfaces.

[38]  Björn W. Schuller,et al.  Context-sensitive multimodal emotion recognition from speech and facial expression using bidirectional LSTM modeling , 2010, INTERSPEECH.

[39]  Mohammed Yeasin,et al.  A Spatio-Temporal Probabilistic Framework for Dividing and Predicting Facial Action Units , 2011, ACII.

[40]  Zhihong Zeng,et al.  A Survey of Affect Recognition Methods: Audio, Visual, and Spontaneous Expressions , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41]  Ling Shao,et al.  Learning Deep and Wide: A Spectral Method for Learning Deep Networks , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[42]  Matti Pietikäinen,et al.  Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[43]  Hatice Gunes,et al.  Output-associative RVM regression for dimensional and continuous emotion prediction , 2011, Face and Gesture 2011.

[44]  Massimiliano Pontil,et al.  Exploiting Unrelated Tasks in Multi-Task Learning , 2012, AISTATS.

[45]  Di Huang,et al.  Local Binary Patterns and Its Application to Facial Image Analysis: A Survey , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[46]  William Curran,et al.  Laughter Type Recognition from Whole Body Motion , 2013, 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction.

[47]  Kenneth Levenberg A METHOD FOR THE SOLUTION OF CERTAIN NON – LINEAR PROBLEMS IN LEAST SQUARES , 1944 .

[48]  Toshikazu Kato,et al.  Understanding subjectivity: An interactionist view , 1999 .

[49]  Qiang Ji,et al.  Capturing Complex Spatio-temporal Relations among Facial Muscles for Facial Expression Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[50]  Louis-Philippe Morency,et al.  Modeling Latent Discriminative Dynamic of Multi-dimensional Affective Signals , 2011, ACII.

[51]  Roddy Cowie,et al.  AVEC 2012: the continuous audio/visual emotion challenge - an introduction , 2012, ICMI.

[52]  Sridha Sridharan,et al.  Automatically Detecting Pain in Video Through Facial Action Units , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[53]  Yasunari Yoshitomi,et al.  Effect of sensor fusion for recognition of emotional states using voice, face image and thermal image of face , 2000, Proceedings 9th IEEE International Workshop on Robot and Human Interactive Communication. IEEE RO-MAN 2000 (Cat. No.00TH8499).

[54]  Benoit Huet,et al.  Bimodal Emotion Recognition , 2010, ICSR.

[55]  Mohamed Chetouani,et al.  Robust continuous prediction of human emotions using multiscale dynamic cues , 2012, ICMI '12.

[56]  Peter Robinson,et al.  Continuous Conditional Neural Fields for Structured Regression , 2014, ECCV.

[57]  Björn W. Schuller,et al.  On-line emotion recognition in a 3-D activation-valence-time continuum using acoustic and linguistic cues , 2009, Journal on Multimodal User Interfaces.

[58]  Nadia Bianchi-Berthouze,et al.  Naturalistic Affective Expression Classification by a Multi-stage Approach Based on Hidden Markov Models , 2011, ACII.

[59]  Roddy Cowie,et al.  FEELTRACE: an instrument for recording perceived emotion in real time , 2000 .

[60]  Massimiliano Pontil,et al.  Multilinear Multitask Learning , 2013, ICML.

[61]  Maja Pantic,et al.  Automatic Analysis of Facial Expressions: The State of the Art , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[62]  Matti Pietikäinen,et al.  A Spontaneous Micro-expression Database: Inducement, collection and baseline , 2013, 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[63]  Jiaqi Liang,et al.  Comparison of TDNN and RNN performances for neuro-identification on small to medium-sized power systems , 2011, 2011 IEEE Symposium on Computational Intelligence Applications In Smart Grid (CIASG).

[64]  Maja Pantic,et al.  Action unit detection using sparse appearance descriptors in space-time video volumes , 2011, Face and Gesture 2011.

[65]  K. Scherer,et al.  The World of Emotions is not Two-Dimensional , 2007, Psychological science.

[66]  Maja Pantic,et al.  The SEMAINE corpus of emotionally coloured character interactions , 2010, 2010 IEEE International Conference on Multimedia and Expo.

[67]  Geoffrey E. Hinton,et al.  Phoneme recognition using time-delay neural networks , 1989, IEEE Trans. Acoust. Speech Signal Process..

[68]  Matti Pietikäinen,et al.  A comparative study of texture measures with classification based on featured distributions , 1996, Pattern Recognit..

[69]  Björn W. Schuller,et al.  AVEC 2013: the continuous audio/visual emotion and depression recognition challenge , 2013, AVEC@ACM Multimedia.

[70]  Andrea Kleinsmith,et al.  Affective Body Expression Perception and Recognition: A Survey , 2013, IEEE Transactions on Affective Computing.

[71]  Tamás D. Gedeon,et al.  Collecting Large, Richly Annotated Facial-Expression Databases from Movies , 2012, IEEE MultiMedia.

[72]  Hatice Gunes,et al.  Continuous Prediction of Spontaneous Affect from Multiple Cues and Modalities in Valence-Arousal Space , 2011, IEEE Transactions on Affective Computing.

[73]  Björn W. Schuller,et al.  String-based audiovisual fusion of behavioural events for the assessment of dimensional affect , 2011, Face and Gesture 2011.

[74]  Hongying Meng,et al.  Affective State Level Recognition in Naturalistic Facial and Vocal Expressions , 2014, IEEE Transactions on Cybernetics.

[75]  Gwen Littlewort,et al.  Automatic Recognition of Facial Actions in Spontaneous Expressions , 2006, J. Multim..

[76]  Zhihua Wang,et al.  A facial expression based continuous emotional state monitoring system with GPU acceleration , 2013, 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[77]  Peter Robinson,et al.  Dimensional affect recognition using Continuous Conditional Random Fields , 2013, 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[78]  Nicolai Marquardt,et al.  Bi-Modal Detection of Painful Reaching for Chronic Pain Rehabilitation Systems , 2014, ICMI.

[79]  Matti Pietikäinen,et al.  Dynamic Texture Recognition Using Local Binary Patterns with an Application to Facial Expressions , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[80]  Ioannis Patras,et al.  Fusion of facial expressions and EEG for implicit affective tagging , 2013, Image Vis. Comput..

[81]  Dongmei Jiang,et al.  Audio Visual Emotion Recognition Based on Triple-Stream Dynamic Bayesian Network Models , 2011, ACII.

[82]  Jeffrey F. Cohn,et al.  Painful data: The UNBC-McMaster shoulder pain expression archive database , 2011, Face and Gesture 2011.

[83]  Xiang Ji,et al.  Representing and Retrieving Video Shots in Human-Centric Brain Imaging Space , 2013, IEEE Transactions on Image Processing.

[84]  Zhigang Deng,et al.  Emotion recognition based on phoneme classes , 2004, INTERSPEECH.

[85]  Yoshua Bengio,et al.  Learning deep physiological models of affect , 2013, IEEE Computational Intelligence Magazine.

[86]  Nadia Bianchi-Berthouze,et al.  Understanding the Role of Body Movement in Player Engagement , 2012, Hum. Comput. Interact..

[87]  Hatice Gunes,et al.  A multi-layer hybrid framework for dimensional emotion classification , 2011, ACM Multimedia.

[88]  Qiang Ji,et al.  Facial Action Unit Recognition by Exploiting Their Dynamic and Semantic Relationships , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[89]  Rosalind W. Picard Affective Computing , 1997 .

[90]  Regan L. Mandryk,et al.  Using psychophysiological techniques to measure user experience with entertainment technologies , 2006, Behav. Inf. Technol..

[91]  Zhihong Zeng,et al.  Audio-visual affect recognition through multi-stream fused HMM for HCI , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[92]  Amy J. C. Cuddy,et al.  Power Posing , 2010, Psychological science.