论文信息 - Data pruning approach to unit selection for inventory generation of concatenative embeddable Chinese TTS systems

Data pruning approach to unit selection for inventory generation of concatenative embeddable Chinese TTS systems

In this paper, a data pruning approach is presented for building acoustic unit inventory for syllable-based concatenative embeddable Chinese TTS system. A 3-portion segmentation of a syllable is proposed based on the nature of voiced/unvoiced structure of Chinese syllable. Individual factorial acoustic measurement of syllable is used to calculate the penalty of perceptual unsatisfactory for concatenation. With respect to the calculated penalties, bad syllables are removed from a cluster. The best syllable of each pruned cluster is selected with a compromised acoustic measurement. The evaluation and application result shows that the method is promising particularly to generate acoustic unit database for small footprint concatenative Chinese (Cantonese and Mandarin) TTS systems.

Guilin Chen | Kaizhi Wang | Yiqing Zu | Zhenli Yu | Dongjian Yue

[1] Keikichi Hirose,et al. Pruning of redundant synthesis instances based on weighted vector quantization , 2001, INTERSPEECH.

[2] Alan W. Black,et al. Unit selection in a concatenative speech synthesis system using a large speech database , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[3] Paul Taylor,et al. Automatically clustering similar units for unit selection in speech synthesis , 1997, EUROSPEECH.

[4] Alan W. Black,et al. Optimal data selection for unit selection synthesis , 2001, SSW.