Analysis of emotional speech prosody in terms of part of speech tags

Representation of emotions in terms of acoustic features of well defined lexical elements is desired for development of emotional speech processing systems. For that purpose, in this paper, the interaction between emotions and part of speech (POS) tags is investigated. Utterances from 3 speakers in angry, happy, sad, and neutral emotions are used to statistically analyze the effects of emotion, POS tag type, position of the tag, and speaker factors on tag duration, energy, and F0 variables. It is found that the main effects of emotion, tag type, and position are significant. Results also show that the effect of emotion is significantly dependent on position, but not on POS tag type. The effect of position is noticeable. POS tags located in the first half of sentences have shorter durations, higher energy, and higher F0 values.

[1]  B. Rosner,et al.  Loudness predicts prominence: fundamental frequency lends little. , 2005, The Journal of the Acoustical Society of America.

[2]  Shrikanth S. Narayanan,et al.  A Statistical Approach for Modeling Prosody Features using POS Tags for Emotional Speech Synthesis , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[3]  Kornel Laskowski,et al.  Combining Efforts for Improving Automatic Classification of Emotional User States , 2006 .

[4]  Paul Taylor,et al.  Assigning phrase breaks from part-of-speech sequences , 1997, Comput. Speech Lang..

[5]  Mark Liberman,et al.  Towards an integrated understanding of speaking rate in conversation , 2006, INTERSPEECH.

[6]  Cynthia Whissell,et al.  THE DICTIONARY OF AFFECT IN LANGUAGE , 1989 .

[7]  Shrikanth S. Narayanan,et al.  An Acoustic Measure for Word Prominence in Spontaneous Speech , 2007, IEEE Transactions on Audio, Speech, and Language Processing.