论文信息 - Improved Prediction of Tone Components for F0 Contour Generation of Mandarin Speech Based on the Tone Nucleus Model

Improved Prediction of Tone Components for F0 Contour Generation of Mandarin Speech Based on the Tone Nucleus Model

Improved prediction of tone components was realized in our method for synthesizing sentence fundamental frequency (F0) contours of Mandarin speech. The method is based on representing a sentence logarithmic F0 contour as a superposition of tone components on phrase components as in the case of generation process model (F0 model). The tone components are realized by concatenating their fragments at tone nuclei predicted by a corpus-based method, while the phrase components are generated by rules under the F0 model framework. In the original method, tone components are assumed to have similar shapes as F0 contours at tone nuclei. This is based on the assumption that the phrase components are almost flat throughout an utterance. However, this is not the case especially for phrase component initials. To cope with this problem, parameters representing tone components of tone nuclei are modified. Also, predicted parameters in earlier processes are used for the prediction of following processes. Result of the listening test conducted for synthetic speech with the generated F0 contours by our methods and also by the HMM-based method confirmed the advantage of ours, especially the improved version.

Keikichi Hirose | Nobuaki Minematsu | Qinghua Sun

[1] Keikichi Hirose,et al. Generation of fundamental frequency contours for Mandarin speech synthesis based on tone nucleus model , 2005, INTERSPEECH.

[2] Keikichi Hirose,et al. Analysis of voice fundamental frequency contours for declarative sentences of Japanese , 1984 .

[3] Sin-Horng Chen,et al. An RNN-based prosodic information synthesizer for Mandarin text-to-speech , 1998, IEEE Trans. Speech Audio Process..

[4] Keikichi Hirose,et al. Tone nucleus modeling for Chinese lexical tone recognition , 2004, Speech Commun..

[5] Keiichi Tokuda,et al. Hidden Markov models based on multi-space probability distribution for pitch pattern modeling , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[6] Keikichi Hirose,et al. Synthesis of F0 contours using generation process model parameters predicted from unlabeled corpora: application to emotional speech synthesis , 2005, Speech Commun..

[7] Keikichi Hirose,et al. Two-step generation of Mandarin F0 contours based on tone nucleus and superpositional models , 2007, SSW.

[8] Keikichi Hirose,et al. Rule-based Generation of Phrase Components in Two-step Synthesis of Fundamental Frequency Contours of Mandarin , 2006 .

[9] Lianhong Cai,et al. Clustering and feature learning based F0 prediction for Chinese speech synthesis , 2002, INTERSPEECH.

[10] Keikichi Hirose,et al. Synthesis of fundamental FDrequency contours of standard Chinese sentences from tone sandhi and focus conditions , 2000, INTERSPEECH.