End-to-end learning for dimensional emotion recognition from physiological signals

Dimensional emotion recognition from physiological signals is a highly challenging task. Common methods rely on hand-crafted features that do not yet provide the performance necessary for real-life application. In this work, we exploit a series of convolutional and recurrent neural networks to predict affect from physiological signals, such as electrocardiogram and electrodermal activity, directly from the raw time representation. The motivation behind this so-called end-to-end approach is that, ultimately, the network learns an intermediate representation of the physiological signals that better suits the task at hand. Experimental evaluations show that, this very first study on end-to-end learning of emotion based on physiology, yields significantly better performance in comparison to existing work on the challenging RECOLA database, which includes fully spontaneous affective behaviors displayed during naturalistic interactions. Furthermore, we gain better understanding of the models' inner representations, by demonstrating that some cells' activations in the convolutional network are correlated to a large extent with hand-crafted features.

[1]  Kuldip K. Paliwal,et al.  Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[2]  Pavel Matejka,et al.  Multimodal Emotion Recognition for AVEC 2016 Challenge , 2016, AVEC@ACM Multimedia.

[3]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[4]  Daniel McDuff,et al.  Advancements in Noncontact, Multiparameter Physiological Measurements Using a Webcam , 2011, IEEE Transactions on Biomedical Engineering.

[5]  Dongmei Jiang,et al.  Multimodal Affective Dimension Prediction Using Deep Bidirectional Long Short-Term Memory Recurrent Neural Networks , 2015, AVEC@ACM Multimedia.

[6]  Bo Sun,et al.  Exploring Multimodal Visual Features for Continuous Affect Recognition , 2016, AVEC@ACM Multimedia.

[7]  Fabien Ringeval,et al.  AV+EC 2015: The First Affect Recognition Challenge Bridging Across Audio, Video, and Physiological Data , 2015, AVEC@ACM Multimedia.

[8]  Fabien Ringeval,et al.  AVEC 2016: Depression, Mood, and Emotion Recognition Workshop and Challenge , 2016, AVEC@ACM Multimedia.

[9]  N. Frijda,et al.  Emotions and respiratory patterns: review and critical analysis. , 1994, International journal of psychophysiology : official journal of the International Organization of Psychophysiology.

[10]  L. Lin,et al.  A concordance correlation coefficient to evaluate reproducibility. , 1989, Biometrics.

[11]  Daniel McDuff,et al.  Biophone: Physiology monitoring from peripheral smartphone motions , 2015, 2015 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC).

[12]  Thomas S. Huang,et al.  How deep neural networks can improve emotion recognition on video data , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[13]  C. Nickerson A note on a concordance correlation coefficient to evaluate reproducibility , 1997 .

[14]  Fabien Ringeval,et al.  Continuous Estimation of Emotions in Speech by Dynamic Cooperative Speaker Models , 2017, IEEE Transactions on Affective Computing.

[15]  Björn W. Schuller,et al.  Automatic multi-lingual arousal detection from voice applied to real product testing applications , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[16]  George Trigeorgis,et al.  Adieu features? End-to-end speech emotion recognition using a deep convolutional recurrent network , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[17]  Patrick Thiam,et al.  Ensemble Methods for Continuous Affect Recognition: Multi-modality, Temporality, and Challenges , 2015, AVEC@ACM Multimedia.

[18]  Murtaza Bulut,et al.  Mobile real-time arousal detection , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[19]  Renaud Séguier,et al.  High-Level Geometry-based Features of Video Modality for Emotion Prediction , 2016, AVEC@ACM Multimedia.

[20]  Qin Jin,et al.  Multi-modal Dimensional Emotion Recognition using Recurrent Neural Networks , 2015, AVEC@ACM Multimedia.

[21]  Jean-Philippe Thiran,et al.  Prediction of asynchronous dimensional emotion ratings from audiovisual and physiological data , 2015, Pattern Recognit. Lett..

[22]  Matthew D. Zeiler ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[23]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[24]  M. Bradley,et al.  The pupil as a measure of emotional arousal and autonomic activation. , 2008, Psychophysiology.

[25]  Fabien Ringeval,et al.  Introducing the RECOLA multimodal corpus of remote collaborative and affective interactions , 2013, 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[26]  Anja Bachmann,et al.  Leveraging smartwatches for unobtrusive mobile ambulatory mood assessment , 2015, UbiComp/ISWC Adjunct.

[27]  Björn W. Schuller,et al.  Automatic recognition of physiological parameters in the human voice: Heart rate and skin conductance , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[28]  Fabien Ringeval,et al.  Discriminatively Trained Recurrent Neural Networks for Continuous Dimensional Emotion Recognition from Audio , 2016, IJCAI.

[29]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[30]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[31]  Björn W. Schuller,et al.  Convolutional Neural Networks with Data Augmentation for Classifying Speakers' Native Language , 2016, INTERSPEECH.

[32]  R. Lazarus Emotion and Adaptation , 1991 .