Learning Spontaneity to Improve Emotion Recognition In Speech

We investigate the effect and usefulness of spontaneity in speech (i.e. whether a given speech data is spontaneous or not) in the context of emotion recognition. We hypothesize that emotional content in speech is interrelated with its spontaneity, and thus propose to use spontaneity classification as an auxiliary task to the problem of emotion recognition. We propose two supervised learning settings that utilize spontaneity to improve speech emotion recognition: a hierarchical model that performs spontaneity detection before performing emotion recognition, and a multitask learning model that jointly learns to recognize both spontaneity and emotion. Through various experiments on a benchmark database, we show that by using spontaneity as an additional information, significant improvement (3%) can be achieved over systems that are unaware of spontaneity. We also observe that spontaneity information is highly useful in recognizing positive emotions as the recognition accuracy improves by 12%.

[1]  Albino Nogueiras,et al.  Speech emotion recognition using hidden Markov models , 2001, INTERSPEECH.

[2]  Zhigang Deng,et al.  Analysis of emotion recognition using facial expressions, speech and multimodal information , 2004, ICMI '04.

[3]  Björn W. Schuller,et al.  Hidden Markov model-based speech emotion recognition , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[4]  Björn W. Schuller,et al.  The INTERSPEECH 2009 emotion challenge , 2009, INTERSPEECH.

[5]  Chengxin Li,et al.  Speech emotion recognition with acoustic and lexical features , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[6]  Johanna D. Moore,et al.  Emotion recognition in spontaneous and acted dialogues , 2015, 2015 International Conference on Affective Computing and Intelligent Interaction (ACII).

[7]  Björn W. Schuller,et al.  Autoencoder-based Unsupervised Domain Adaptation for Speech Emotion Recognition , 2014, IEEE Signal Processing Letters.

[8]  Massimiliano Pontil,et al.  Regularized multi--task learning , 2004, KDD.

[9]  Ngoc Thang Vu,et al.  Attentive Convolutional Neural Network Based Speech Emotion Recognition: A Study on the Impact of Input Features, Signal Length, and Acted Speech , 2017, INTERSPEECH.

[10]  Carlos Busso,et al.  Ensemble feature selection for domain adaptation in speech emotion recognition , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[11]  Richard Dufour,et al.  Characterizing and detecting spontaneous speech: Application to speaker role recognition , 2014, Speech Commun..

[12]  Shrikanth S. Narayanan,et al.  Toward detecting emotions in spoken dialogs , 2005, IEEE Transactions on Speech and Audio Processing.

[13]  Tanaya Guha,et al.  Multimodal Prediction of Affective Dimensions and Depression in Human-Computer Interactions , 2014, AVEC '14.

[14]  Tong Zhang,et al.  Cross-Corpus Speech Emotion Recognition Based on Domain-Adaptive Least-Squares Regression , 2016, IEEE Signal Processing Letters.

[15]  Gwenn Englebienne,et al.  Towards Speech Emotion Recognition "in the Wild" Using Aggregated Corpora and Deep Multi-Task Learning , 2017, INTERSPEECH.

[16]  Frank Dellaert,et al.  Recognizing emotion in speech , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[17]  A. Cutler,et al.  Detection of Target Phonemes in Spontaneous and Read Speech , 1988, Language and speech.

[18]  Wootaek Lim,et al.  Speech emotion recognition using convolutional and Recurrent Neural Networks , 2016, 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA).

[19]  Georges Linarès,et al.  Spontaneous Speech Characterization and Detection in Large Audio Database , 2009 .

[20]  Louis-Philippe Morency,et al.  Representation Learning for Speech Emotion Recognition , 2016, INTERSPEECH.

[21]  Carlos Busso,et al.  IEMOCAP: interactive emotional dyadic motion capture database , 2008, Lang. Resour. Evaluation.