End-to-end label uncertainty modeling for speech emotion recognition using Bayesian neural networks

Emotions are subjective constructs. Recent end-to-end speech emotion recognition systems are typically agnostic to the subjective nature of emotions, despite their state-of-the-art performances. In this work, we introduce an end-to-end Bayesian neural network architecture to capture the inherent subjectivity in emotions. To the best of our knowledge, this work is the first to use Bayesian neural networks for speech emotion recognition. At training, the network learns a distribution of weights to capture the inherent uncertainty related to subjective emotion annotations. For this, we introduce a loss term which enables the model to be explicitly trained on a distribution of emotion annotations, rather than training them exclusively on mean or gold-standard labels. We evaluate the proposed approach on the AVEC’16 emotion recognition dataset. Qualitative and quantitative analysis of the results reveal that the proposed model can aptly capture the distribution of subjective emotion annotations with a compromise between mean and standard deviation estimations.

[1]  George Trigeorgis,et al.  Adieu features? End-to-end speech emotion recognition using a deep convolutional recurrent network , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2]  J. Russell A circumplex model of affect. , 1980 .

[3]  Stefanos Zafeiriou,et al.  Speech Emotion Recognition Using Semantic Information , 2021, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[4]  Björn W. Schuller,et al.  End-to-End Speech Emotion Recognition Using Deep Neural Networks , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[5]  Björn W. Schuller,et al.  Categorical and dimensional affect analysis in continuous input: Current trends and future directions , 2013, Image Vis. Comput..

[6]  K. Kroschel,et al.  Evaluation of natural emotions using self assessment manikins , 2005, IEEE Workshop on Automatic Speech Recognition and Understanding, 2005..

[7]  Hayley Hung,et al.  Defining and Quantifying Conversation Quality in Spontaneous Interactions , 2020, ICMI Companion.

[8]  Georgios Tzimiropoulos,et al.  Stochastic Process Regression for Cross-Cultural Speech Emotion Recognition , 2021, Interspeech.

[9]  Julien Cornebise,et al.  Weight Uncertainty in Neural Network , 2015, ICML.

[10]  Joseph E LeDoux,et al.  The subjective experience of emotion: a fearful view , 2018, Current Opinion in Behavioral Sciences.

[11]  Carlos Busso,et al.  Active Learning for Speech Emotion Recognition Using Deep Neural Network , 2019, 2019 8th International Conference on Affective Computing and Intelligent Interaction (ACII).

[12]  Yee Whye Teh,et al.  Conditional Neural Processes , 2018, ICML.

[13]  N. Lehmann-Willenbrock,et al.  Affect in meetings: An interpersonal construct in dynamic interaction processes , 2015 .

[14]  J. Hietanen,et al.  Maps of subjective feelings , 2018, Proceedings of the National Academy of Sciences.

[15]  Fabien Ringeval,et al.  AVEC 2016: Depression, Mood, and Emotion Recognition Workshop and Challenge , 2016, AVEC@ACM Multimedia.

[16]  Zhao Ren,et al.  Exploring Perception Uncertainty for Emotion Recognition in Dyadic Conversation and Music Listening , 2020, Cognitive Computation.

[17]  Fabien Ringeval,et al.  Introducing the RECOLA multimodal corpus of remote collaborative and affective interactions , 2013, 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[18]  William A. Cunningham,et al.  The rise of affectivism , 2021, Nature Human Behaviour.

[19]  R. Reisenzein Pleasure-Arousal Theory and the Intensity of Emotions , 1994 .

[20]  Klaus H. Maier-Hein,et al.  A Probabilistic U-Net for Segmentation of Ambiguous Images , 2018, NeurIPS.

[21]  Björn W. Schuller,et al.  From Hard to Soft: Towards more Human-like Emotion Recognition by Modelling the Perception Uncertainty , 2017, ACM Multimedia.

[22]  Yuhao Luo,et al.  Uncertainty in Bayesian deep label distribution learning , 2021, Appl. Soft Comput..

[23]  Carlos Busso,et al.  Modeling Uncertainty in Predicting Emotional Attributes from Spontaneous Speech , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).