Two-step generation of Mandarin F0 contours based on tone nucleus and superpositional models

A 2-step scheme was developed in our method for synthesizing sentence fundamental frequency (F0) contours of Mandarin speech. The method is based on representing a sentence logarithmic F0 contour as a superposition of tone components on phrase components as in the case of generation process model (F0 model). The tone components are realized by concatenating tone nucleus F0 patterns generated by a corpus-based method, while the phrase components are generated by rules under the F0 model framework. In the 2-step scheme, the phrase components are first generated and their information is added to the inputs for the prediction of tone nucleus F0 patterns. Result of listening tests on synthetic speech with the synthesized F0 contours verified the validity of the developed scheme. For comparison, we also generated F0 contours without decomposing them into tone and phrase components as most existing methods did. Although from the viewpoint of naturalness of synthetic speech, the result did not show clear advantage of the proposed method, from the viewpoint of flexibility the advantage came clear: by manipulating phrase components in the proposed method, a better focus control was realized.

[1]  Keikichi Hirose,et al.  Improved corpus-based synthesis of fundamental frequency contours using generation process model , 2002, INTERSPEECH.

[2]  Keikichi Hirose,et al.  Analysis of the effects of word emphasis and echo question on F0 contours of Cantonese utterances , 2005, INTERSPEECH.

[3]  Keikichi Hirose,et al.  Rule-based Generation of Phrase Components in Two-step Synthesis of Fundamental Frequency Contours of Mandarin , 2006 .

[4]  Keikichi Hirose,et al.  Synthesis of fundamental FDrequency contours of standard Chinese sentences from tone sandhi and focus conditions , 2000, INTERSPEECH.

[5]  Keikichi Hirose,et al.  A System for the Synthesis of High-Quality Speech from Texts on General Weather Conditions (Special Section on Speech Synthesis: Current Technologies and Equipment) , 1993 .

[6]  Keikichi Hirose,et al.  Generation of prosodic symbols for rule-synthesis of connected speech of Japanese , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7]  Keikichi Hirose,et al.  Generation of fundamental frequency contours for Mandarin speech synthesis based on tone nucleus model , 2005, INTERSPEECH.

[8]  Lianhong Cai,et al.  Clustering and feature learning based F0 prediction for Chinese speech synthesis , 2002, INTERSPEECH.

[9]  Sin-Horng Chen,et al.  An RNN-based prosodic information synthesizer for Mandarin text-to-speech , 1998, IEEE Trans. Speech Audio Process..

[10]  Keikichi Hirose,et al.  Tone nucleus modeling for Chinese lexical tone recognition , 2004, Speech Commun..