论文信息 - Prosodic factors for predicting local pitch shape

Prosodic factors for predicting local pitch shape

In this paper, we investigate the predictive power of different prosodic factorization schemes with respect to pitch movement. We use this to propose an extension of a standard diphone database with diphones that have been recorded in different prosodic contexts. The goal of this research is to reduce the amount of pitch modification required, thereby improving the segmental quality of the synthetic voice. We present a factorization scheme based on the foot structure of utterances and show that this efficient scheme results in a fairly small number of additional diphones that need to be recorded.

J. van Santen | J. Wouters | E. Klabbers

[1] Julia Hirschberg,et al. Segmental effects on timing and height of pitch contours , 1994, ICSLP.

[2] Silvia Quazza,et al. Choose the best to modify the least: a new generation concatenative synthesis system , 1999, EUROSPEECH.

[3] Ove Andersen,et al. Must diphone synthesis be so unnatural? , 2001, INTERSPEECH.

[4] Rpg Rene Collier,et al. On the combined use of accented and unaccented diphones in speech synthesis , 1991 .

[5] Paul Taylor,et al. The architecture of the Festival speech synthesis system , 1998, SSW.

[6] Marc C. Beutnagel,et al. The AT & T NEXT-GEN TTS system , 1999 .