Script Design Based on Decision Tree with Context Vector and Acoustic Distance for Mandarin TTS

The performance of a TTS system relies on the acoustic features of output speech, while we have only the linguistic information in text context for the corpus design phase. To reconcile that conflict, this paper proposes a design method based on decision tree. First, trees are trained using an existing speech corpus: the splitting questions are selected from context vectors and the distance metric is based on acoustic features. Then units from a new text corpus which will be the source of target script are inserted into the trees. The sentence selection strategy is a combination of coverage of major tree nodes and frequent context vector clusters. Experiment is carried out by collecting a Beijing Olympics related corpus taking advantage of an existing domain-unspecified speech corpus and supplementary domain-specified text. Informal listening test on TTS outputs confirms that the proposed method achieves a better optimization of speech. It includes not only phonetic balance as conventional methods do, but prosodic balance as well.