An acoustic study of emotions expressed in speech

In this study, we investigate acoustic properties of speech associated with four different emotions (sadness, anger, happiness, and neutral) intentionally expressed in speech by an actress. The aim is to obtain detailed acoustic knowledge on how speech is modulated when speaker’s emotion changes from neutral to a certain emotional state. It is based on measurements of acoustic parameters related to speech prosody, vowel articulation and spectral energy distribution. Acoustic similarities and differences among the emotions are then explored with mutual information computation, multidimensional scaling, and comparison of acoustic likelihoods relative to the neutral emotion. In addition, acoustic separability of the emotions is tested using the discriminant analysis at the utterance level and the result is compared with human evaluation. Results show that happiness/anger and neutral/sadness share similar acoustic properties in this speaker. Speech associated with anger and happiness are characterized by longer utterance duration, shorter inter-word silence, higher pitch and energy values with wider ranges, showing the characteristics of exaggerated or hyperarticulated speech. The discriminant analysis indicates that within-group acoustic separability is relatively poor, suggesting that conventional acoustic parameters examined in this study are not effective in describing the emotions along the valence (or pleasure) dimension. It is noted that RMS energy, inter-word silence and speaking rate are useful in distinguishing sadness from others. Interestingly, the between-group difference in formant patterns seems better reflected in back vowels such as /a/ (/father/) than in the front vowels. Larger lip opening and/or more tongue constriction at the mid or rear part of the vocal tract could be underlying reasons.

[1]  Diane J. Litman,et al.  Recognizing emotions from student speech in tutoring dialogues , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[2]  Albino Nogueiras,et al.  Speech emotion recognition using hidden Markov models , 2001, INTERSPEECH.

[3]  Iain R. Murray,et al.  Toward the simulation of emotion in synthetic speech: a review of the literature on human vocal emotion. , 1993, The Journal of the Acoustical Society of America.

[4]  Miriam Kienast,et al.  Acoustical analysis of spectral and temporal changes in emotional speech , 2000 .

[5]  George N. Votsis,et al.  Emotion recognition in human-computer interaction , 2001, IEEE Signal Process. Mag..

[6]  K. Scherer,et al.  Acoustic profiles in vocal emotion expression. , 1996, Journal of personality and social psychology.

[7]  H. Schlosberg Three dimensions of emotion. , 1954, Psychological review.

[8]  Roddy Cowie,et al.  Emotional speech: Towards a new generation of databases , 2003, Speech Commun..

[9]  Shrikanth S. Narayanan,et al.  Toward detecting emotions in spoken dialogs , 2005, IEEE Transactions on Speech and Audio Processing.