论文信息 - Prosody model in a Mandarin text-to-speech system based on a hierarchical approach

Prosody model in a Mandarin text-to-speech system based on a hierarchical approach

The authors developed a prosody model in a Mandarin text-to-speech (TTS) system. We extract some meaningful parameters form the voice files and text files. We find these parameters in a hierarchical way. For each syllable, we consider the following four parameters (there are five parameters in our duration prediction model): information of word (consonants, vowel and tone); information of phrase; information of breath group; and information of sentences (duration model add punctuation mark). In the syllable duration prediction model, there are 37% training syllables in the inside test and 43% test syllables in the outside test, with prediction error less than ratio 0.1. The average error of all syllables in the inside test is 0.182 and 0.169 in the outside test. In the syllable volume prediction model, there are 81% training syllables in the inside test and 76.2% test syllables in the outside test, with prediction error less than ratio 0.1. The average error of all syllables in the inside test is 0.176 and 0.166 in the outside test. For the performance evaluation of the pitch prediction module, there are 64% internal samples and 57% external samples with pattern error being within 5 Hz. The average pattern error of all syllables in the inside test is 5 Hz and 6 Hz in the outside test.

[1] Chiu-yu Tseng,et al. Improved tone concatenation rules in a formant-based Chinese text-to-speech system , 1993, IEEE Trans. Speech Audio Process..

[2] Chiu-yu Tseng,et al. The synthesis rules in a Chinese text-to-speech system , 1989, IEEE Trans. Acoust. Speech Signal Process..

[3] Shaw-Hwa Hwang,et al. A Mandarin text-to-speech system , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[4] Yu,et al. An efficient Mandarin text-to-speech system on time domain , 1998 .

[5] Chiu-yu Tseng,et al. Automatic generation of prosodic structure for high quality Mandarin speech synthesis , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[6] Hsiao-Wuen Hon,et al. Yanhui (宴會), a Softwre Based High Performance Mandarin Text-To-Speech System , 1994, ROCLING/IJCLCLP.