Pitch contour model for Chinese text-to-speech using CART and statistical model
暂无分享,去创建一个
This paper describes an approach to generating prosody parameters for Mandarin Chinese text-to-speech system. The Chinese fundamental frequency contour is decomposed into two parts, a global intonation contour and a syllable level tone contour. The global intonation contour is converted to pitch target labels in corpus. It is predicted by first predicting pitch target labels using statistical model and classification tree, and then the labels are converted into real pitch values. The local syllable level tone contour is classified into a definite number of contour types using clustering approach. The prediction of local syllable pitch contour is done by classification tree approach. Experiment shows that this approach achieves an accurate prediction result.
[1] Chiu-yu Tseng,et al. The synthesis rules in a Chinese text-to-speech system , 1989, IEEE Trans. Acoust. Speech Signal Process..
[2] Leo Breiman,et al. Classification and Regression Trees , 1984 .
[3] Sin-Horng Chen,et al. An RNN-based prosodic information synthesizer for Mandarin text-to-speech , 1998, IEEE Trans. Speech Audio Process..
[4] Shunde Jin,et al. An acoustic study of sentence stress in Mandarin Chinese , 1996 .