论文信息 - Script Design Based on Decision Tree with Context Vector and Acoustic Distance for Mandarin TTS

Script Design Based on Decision Tree with Context Vector and Acoustic Distance for Mandarin TTS

The performance of a TTS system relies on the acoustic features of output speech, while we have only the linguistic information in text context for the corpus design phase. To reconcile that conflict, this paper proposes a design method based on decision tree. First, trees are trained using an existing speech corpus: the splitting questions are selected from context vectors and the distance metric is based on acoustic features. Then units from a new text corpus which will be the source of target script are inserted into the trees. The sentence selection strategy is a combination of coverage of major tree nodes and frequent context vector clusters. Experiment is carried out by collecting a Beijing Olympics related corpus taking advantage of an existing domain-unspecified speech corpus and supplementary domain-specified text. Informal listening test on TTS outputs confirms that the proposed method achieves a better optimization of speech. It includes not only phonetic balance as conventional methods do, but prosodic balance as well.

Yuan Dong | Lianhong Cai | Haila Wang | Dandan Cui | Denzhi Huang

[1] Paul C. Bagshaw,et al. A method of unit preselection for speech synthesis based on acoustic clustering and decision trees , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[2] Jianfen Cao,et al. Coarticulation and prosodic hierarchy i , 2006 .

[3] Cai Lianhong. Corpus Analysis Based on Decision Tree , 2006 .

[4] Vasek Chvátal,et al. A Greedy Heuristic for the Set-Covering Problem , 1979, Math. Oper. Res..

[5] Haiping Li,et al. Generating script using statistical information of the context variation unit vector , 2002, INTERSPEECH.

[6] Robert M Thrall,et al. Mathematics of Operations Research. , 1978 .

[7] Thierry Dutoit,et al. Text design for TTS speech corpus building using a modified greedy selection , 2003, INTERSPEECH.

[8] Hisashi Kawai,et al. A design method of speech corpus for text-to-speech synthesis taking account of prosody , 2000, INTERSPEECH.