Applying pitch connection control in Mandarin speech synthesis

In this paper, a novel tone-based pitch connection control in unit selection is described to improve naturalness of output speech for Mandarin text-to-speech (TTS) baseline system. This study mainly focuses on pitch connections of concatenative syllables. To improve the concatenation quality, we apply offset pitch of preceding syllable and onset pitch of following syllable in unit selection. According to the statistical result on corpus, three types of pitch connection constraints are proposed. Based on the property of pitch connection constraint, corresponding tone-based cost functions play important role in unit selection for continuity improving at concatenation point. By applying the defined cost functions in unit selection, more suitable units are selected and more natural-sounding synthesized speech is achieved.

[1]  Yoichi Yamashita,et al.  Stochastic F0 contour model based on the clustering of F0 shapes of a syntactic unit , 2001, INTERSPEECH.

[2]  Yong Zhao,et al.  Microsoft Mulan - a bilingual TTS system , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[3]  Chiu-yu Tseng,et al.  Improved tone concatenation rules in a formant-based Chinese text-to-speech system , 1993, IEEE Trans. Speech Audio Process..

[4]  Ki-Seung Lee,et al.  Context-adaptive smoothing for concatenative speech synthesis , 2002, IEEE Signal Processing Letters.

[5]  Takashi Saito,et al.  Generating F0 Contours by Statistical Manipulation of Natural F0 Shapes , 2001, IEICE Trans. Inf. Syst..

[6]  Emily Q. Wang,et al.  Pitch targets and their realization: Evidence from Mandarin Chinese , 2001, Speech Commun..

[7]  Yi Xu,et al.  Maximum speed of pitch change and how it may relate to speech. , 2002, The Journal of the Acoustical Society of America.

[8]  Michael W. Macon,et al.  Control of spectral dynamics in concatenative speech synthesis , 2001, IEEE Trans. Speech Audio Process..