On the robustness of overall F0-only modifications to the perception of emotions in speech.

Emotional information in speech is commonly described in terms of prosody features such as F0, duration, and energy. In this paper, the focus is on how F0 characteristics can be used to effectively parametrize emotional quality in speech signals. Using an analysis-by-synthesis approach, F0 mean, range, and shape properties of emotional utterances are systematically modified. The results show the aspects of the F0 parameter that can be modified without causing any significant changes in the perception of emotions. To model this behavior the concept of emotional regions is introduced. Emotional regions represent the variability present in the emotional speech and provide a new procedure for studying speech cues for judgments of emotion. The method is applied to F0 but can be also used on other aspects of prosody such as duration or loudness. Statistical analysis of the factors affecting the emotional regions, and discussion of the effects of F0 modifications on the emotion and speech quality perception are also presented. The results show that F0 range is more important than F0 mean for emotion expression.

[1]  H. Schlosberg Three dimensions of emotion. , 1954, Psychological review.

[2]  Kim E. A. Silverman,et al.  Evidence for the independent function of intonation contour type, voice quality, and F0 range in signaling speaker affect , 1985 .

[3]  Eric Moulines,et al.  Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones , 1989, Speech Commun..

[4]  Mari Ostendorf,et al.  TOBI: a standard for labeling English prosody , 1992, ICSLP.

[5]  Iain R. Murray,et al.  Toward the simulation of emotion in synthetic speech: a review of the literature on human vocal emotion. , 1993, The Journal of the Acoustical Society of America.

[6]  Rosalind W. Picard Affective computing: (526112012-054) , 1997 .

[7]  J. Montero,et al.  ANALYSIS AND MODELLING OF EMOTIONAL SPEECH IN SPANISH , 1999 .

[8]  P. Roach,et al.  TECHNIQUES FOR THE PHONETIC DESCRIPTION OF EMOTIONAL SPEECH , 2000 .

[9]  W. Sendlmeier,et al.  Verification of acoustical correlates of emotional speech using formant-synthesis , 2000 .

[10]  P Taylor,et al.  Analysis and synthesis of intonation using the Tilt model. , 2000, The Journal of the Acoustical Society of America.

[11]  M. Pell Influence of emotion and focus location on prosody in matched statements and questions. , 2001, The Journal of the Acoustical Society of America.

[12]  Roddy Cowie,et al.  Acoustic correlates of emotion dimensions in view of speech synthesis , 2001, INTERSPEECH.

[13]  George N. Votsis,et al.  Emotion recognition in human-computer interaction , 2001, IEEE Signal Process. Mag..

[14]  Marc Schröder,et al.  Emotional speech synthesis: a review , 2001, INTERSPEECH.

[15]  Shrikanth S. Narayanan,et al.  Expressive speech synthesis using a concatenative synthesizer , 2002, INTERSPEECH.

[16]  Antoine Raux,et al.  A unit selection approach to F0 modeling and its application to emphasis , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[17]  Nick Campbell,et al.  A corpus-based speech synthesis system with emotion , 2003, Speech Commun..

[18]  Paul Boersma,et al.  Praat: doing phonetics by computer , 2003 .

[19]  Klaus R. Scherer,et al.  Vocal communication of emotion: A review of research paradigms , 2003, Speech Commun..

[20]  Cynthia Breazeal,et al.  Affective Learning — A Manifesto , 2004 .

[21]  Esther Klabbers,et al.  Clustering of foot-based pitch contours in expressive speech , 2004, SSW.

[22]  Zhigang Deng,et al.  An acoustic study of emotions expressed in speech , 2004, INTERSPEECH.

[23]  Carlos Busso,et al.  Investigating the role of phoneme-level modifications in emotional speech resynthesis , 2005, INTERSPEECH.

[24]  Klaus R. Scherer,et al.  The role of intonation in emotional expressions , 2005, Speech Commun..

[25]  Yong Zhao,et al.  Modeling stylized invariance and local variability of prosody in text-to-speech synthesis , 2006, Speech Commun..

[26]  Greg Kochanski,et al.  Evidence for attractors in English intonation. , 2006, The Journal of the Acoustical Society of America.

[27]  Shrikanth S. Narayanan,et al.  Primitives-based evaluation and estimation of emotions in speech , 2007, Speech Commun..

[28]  H. Traunmüller Speech considered as modulated voice 1 Speech considered as modulated voice , 2007 .