论文信息 - Automatic prosody labeling using both text and acoustic information

Automatic prosody labeling using both text and acoustic information

Prosody is an important factor for a high quality text-to-speech (TTS) system. The prosody is often described with a hierarchical structure. So the generation of the hierarchical prosody structure is very important both in the corpus building and the run-time text analysis. But the prosody labeling procedure is laborious and time consuming. Moreover, to keep the consistence between different labelers and even the same labeler in different time is difficult. In this paper an automatic prosody labeling system is presented, in which the decision tree plus Viterbi decoding framework proposed Wightman and Ostendorf (1994) is used. In the system, not only the acoustic information but also the text information such as the part-of-speech (POS) of a word is used. A prosody model is built up using the automatically labeled corpus for our Mandarin TTS system. Listening test shows that the automatic prosody labeling system works pretty well.

Wei Zhang | Weibin Zhu | Qin Shi | Xijun Ma | Li Qin Shen

[1] Haiping Li,et al. Generating script using statistical information of the context variation unit vector , 2002, INTERSPEECH.

[2] Ann K. Syrdal,et al. Perceptually based automatic prosody labeling and prosodically enriched unit selection improve concatenative text-to-speech synthesis , 2000, INTERSPEECH.

[3] Wei Zhang,et al. Statistic prosody structure prediction , 2002, Proceedings of 2002 IEEE Workshop on Speech Synthesis, 2002..

[4] Mari Ostendorf,et al. Automatic labeling of prosodic patterns , 1994, IEEE Trans. Speech Audio Process..

[5] Wei Zhang,et al. Corpus building for data-driven TTS systems , 2002, Proceedings of 2002 IEEE Workshop on Speech Synthesis, 2002..

[6] Giuseppe Riccardi,et al. Prosody recognition from speech utterances using acoustic and linguistic based models of prosodic events , 1999, EUROSPEECH.

[7] Mari Ostendorf,et al. TOBI: a standard for labeling English prosody , 1992, ICSLP.