Consistency of base frequency labelling for the F0 contour generation model using expressive emotional speech corpora

To investigate the consistency of base frequency ( Fb) labelling of the F0 contour generation model for expressive and/or authentic emotional speech, a Fb labelling experiment was conducted using three trained labellers employing the parallel corpus of emotional speech, Online-gaming voice chat corpus with emotional labelling (OGVC). Twenty-four utterances from spontaneous dialog speech and emotion-acted speech in the OGVC were labelled with theFb, phrase command, and accent command by the three labellers. A repeated measure analysis of variance was performed with the factor of the corpus type, gender, speaker, emotion, and labeller, for the Fb value of each utterance. The results show a significant main effect on gender, speaker, and emotion and the significant interaction between speaker and emotion. The results also indicate that the value ofFb varied when the different emotions were expressed, even when uttered by the same speaker. Moreover, the precise inspection for theFb of each utterance suggests that the Fb also varied when the linguistic content of the utterances differed, even if the same emotion was expressed in those utterances.

[1]  Keikichi Hirose,et al.  Analysis of voice fundamental frequency contours for declarative sentences of Japanese , 1984 .

[2]  Edouard Geoffrois A pitch contour analysis guided by prosodic event detection , 1993, EUROSPEECH.

[3]  Keikichi Hirose,et al.  Data-driven generation of F0 contours using a superpositional model , 2003, Speech Commun..

[4]  Keikichi Hirose,et al.  Synthesis of F0 contours using generation process model parameters predicted from unlabeled corpora: application to emotional speech synthesis , 2005, Speech Commun..

[5]  Hansjörg Mixdorff,et al.  Analysing fundamental frequency contours and local speech rate in map task dialogs , 2005, Speech Commun..

[6]  J. Russell A circumplex model of affect. , 1980 .

[7]  Sumio Ohno,et al.  An analysis of individual differences in the f0 contour and the duration of anger utterances at several degrees , 2007, INTERSPEECH.

[8]  Hansjörg Mixdorff,et al.  The influence of speech rate on Fujisaki model parameters , 2014, EURASIP Journal on Audio, Speech, and Music Processing.

[9]  Andreas Koch,et al.  Parameter extraction of a quantitative intonation model with wavelet analysis and evolutionary optimization , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[10]  Yoshiko Arimoto,et al.  Naturalistic emotional speech collection paradigm with online game and its psychological and acoustical assessment , 2012 .

[11]  Sumio Ohno,et al.  Analysis and synthesis of fundamental frequency contours of Standard Chinese using the command-response model , 2005, Speech Commun..

[12]  Keikichi Hirose,et al.  A method for generation of Mandarin F0 contours based on tone nucleus model and superpositional model , 2012, Speech Commun..