Unveiling the Acoustic Properties that Describe the Valence Dimension

One of the main challenges in emotion recognition from speech is to discriminate emotions in the valence domain (positive versus negative). While acoustic features provide good characterization in the activation/arousal dimension (excited versus calm), they usually fail to discriminate between sentences with different valence attributes (e.g., happy versus anger). This paper focuses on this dimension, which is key in many behavioral problems (e.g., depression). First, a regression analysis is conducted to identify the most informative features. Separate support vector regression (SVR) models are trained with various feature groups. The results reveal that spectral and F0 features produce the most accurate predictions of valence. Then, sentences with similar activation, but with different valence are carefully studied. The discriminative power in valence domain of individual features is studied with logistic regression analysis. This controlled experiment reveals differences between positive and negative emotions in the F0 distribution (e.g., positive skewness). The study also uncovers characteristic trends in the spectral domain.

[1]  Julia Hirschberg,et al.  Classifying subject ratings of emotional speech using acoustic features , 2003, INTERSPEECH.

[2]  Carlos Busso,et al.  Analysis of Emotionally Salient Aspects of Fundamental Frequency for Emotion Detection , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  Kim E. A. Silverman,et al.  Evidence for the independent function of intonation contour type, voice quality, and F0 range in signaling speaker affect , 1985 .

[4]  Loïc Kessous,et al.  Whodunnit - Searching for the most important feature types signalling emotion-related user states in speech , 2011, Comput. Speech Lang..

[5]  Zhigang Deng,et al.  An acoustic study of emotions expressed in speech , 2004, INTERSPEECH.

[6]  Björn W. Schuller,et al.  The INTERSPEECH 2011 Speaker State Challenge , 2011, INTERSPEECH.

[7]  Shrikanth S. Narayanan,et al.  Primitives-based evaluation and estimation of emotions in speech , 2007, Speech Commun..

[8]  Klaus R. Scherer,et al.  Emotion dimensions and formant position , 2009, INTERSPEECH.

[9]  Luis Villaseñor Pineda,et al.  Acoustic feature selection and classification of emotions in speech using a 3D continuous emotion model , 2012, Biomed. Signal Process. Control..

[10]  Björn W. Schuller,et al.  Abandoning emotion classes - towards continuous emotion recognition with modelling of long-range dependencies , 2008, INTERSPEECH.

[11]  Shrikanth S. Narayanan,et al.  Support Vector Regression for Automatic Recognition of Spontaneous Emotions in Speech , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[12]  Shrikanth S. Narayanan,et al.  The Vera am Mittag German audio-visual emotional speech database , 2008, 2008 IEEE International Conference on Multimedia and Expo.