Control and prediction of the impact of pitch modification on synthetic speech quality
暂无分享,去创建一个
In order to use speech synthesis to generate highly expressive speech convincingly, the problem of poor prosody (both prediction and generation) needs to be overcome. In this paper we will show that with a simple annotation scheme using the notion of foot structure, we can more accurately predict the shape of local pitch contours. The assumption is that with a better selection mechanism we can reduce the amount of pitch modification required, thereby reducing speech degradation. In addition, we present a perceptual experiment that investigates the degradation introduced by pitch modification using the OGIresLPC algorithm. We correlated the weighted perceptual score with different pitch and delta pitch distances. The best combination of distance measures is able to explain 63% of the variance in the perceptual scores. Decreasing the pitch is shown to have a higher impact on perception than increasing the pitch.
[1] Bernd Möbius,et al. Rare Events and Closed Domains: Two Delicate Concepts in Speech Synthesis , 2003, Int. J. Speech Technol..
[2] Alexander Kain,et al. OGIresLPC: Diphone synthesizer using residual-excited linear prediction , 1997 .
[3] Julia Hirschberg,et al. Segmental effects on timing and height of pitch contours , 1994, ICSLP.
[4] J. van Santen,et al. Prosodic factors for predicting local pitch shape , 2002, Proceedings of 2002 IEEE Workshop on Speech Synthesis, 2002..