论文信息 - Corpus-based generation of fundamental frequency contours using generation process model and considering emotional focuses

Corpus-based generation of fundamental frequency contours using generation process model and considering emotional focuses

We formerly conducted emotional speech synthesis using our corpus-based method of generating fundamental frequency (F0) contours from text. The method predicts command values of F0 contour generation process model instead of directly predicting F0 value of each time frame. A better control of F0 contours was realized by taking the emotional level of each bunsetsu into account: adding information on which bunsetsu(s) the emotion is especially placed to the command predictor inputs. In the case of anger, F0 contours closer to the target contours are obtained by adding emotional levels. Speech synthesis was conducted by generating F0 contours in two ways: using commands predicted by taking emotional levels into account and those not. The result of perceptual experiment indicated that emotion was conveyed well by adding emotional levels. Index Terms: speech synthesis, emotion, F0 contour

Keikichi Hirose | Nobuaki Minematsu | Yasufumi Asano

[1] Nick Campbell,et al. A corpus-based speech synthesis system with emotion , 2003, Speech Commun..

[2] Takao Kobayashi,et al. Modeling of various speaking styles and emotions for HMM-based speech synthesis , 2003, INTERSPEECH.

[3] Keikichi Hirose,et al. Data-driven generation of F0 contours using a superpositional model , 2003, Speech Commun..

[4] Keikichi Hirose,et al. Analysis of voice fundamental frequency contours for declarative sentences of Japanese , 1984 .

[5] Keikichi Hirose,et al. Synthesizing dialogue speech of Japanese based on the quantitative analysis of prosodic features , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[6] Keiichi Tokuda,et al. Hidden Markov models based on multi-space probability distribution for pitch pattern modeling , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[7] Keikichi Hirose,et al. Synthesis of F0 contours using generation process model parameters predicted from unlabeled corpora: application to emotional speech synthesis , 2005, Speech Commun..