Statistic prosody structure prediction

Hierarchical prosody structure generation is a key component for a speech synthesis system. This paper presents a statistic method that predicts the prosody structure for the Chinese text-to-speech (TTS) system by combining a dynamic program method with the rules. The method is based on a manually annotated corpus extracted from the natural speech (IBM Mandarin TTS Corpus for Female 02). The experimental results show that an accuracy of 91.2% for predicting prosodic structure can be achieved. A state-of-the-art Mandarin TTS system is worked out based on the hierarchical prosody structure. Listening tests show that the prosody structure works pretty well.

[1]  Chiu-yu Tseng,et al.  A Chinese text-to-speech system based on part-of-speech analysis, prosodic modeling and non-uniform units , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Chiu-yu Tseng,et al.  Automatic generation of prosodic structure for high quality Mandarin speech synthesis , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[3]  Ren-Hua Wang,et al.  A new Chinese text-to-speech system with high naturalness , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.