Pitch alteration technique in speech synthesis system
暂无分享,去创建一个
In the case of speech synthesis, the waveform coding method with its high quality is mainly used in synthesis by analysis. Because the parameters of this coding method are not classified as both the excitation and vocal tract parameters, it is difficult to apply the waveform coding method to synthesis by rule. Thus, in order to apply the waveform coding method to synthesis by rule, a pitch alteration is required for the prosody control. In the speech synthesis method by the conventional PSOLA (pitch synchronous overlap and add) technique, applying a symmetric window function to the asymmetric speech waveform, results in the energy unbalance phenomenon according to the degree of overlapped in the pitch interval adjustment. In this paper, to overcome the energy unbalance phenomenon, we proposed a new method that can convert the asymmetric waveform to a symmetric one by time-frequency conversion. As a result, we can obtain an average spectrum distortion ratio of 6.38% according to the pitch alteration ratio.
[1] Tohru Takagi,et al. A speech prosody conversion system with a high quality speech analysis-synthesis method , 1993, EUROSPEECH.
[2] Frank Fallside,et al. A technique for using multipulse linear predictive speech synthesis in text-to-speech type systems , 1987, IEEE Trans. Acoust. Speech Signal Process..
[3] Ronald W. Schafer,et al. Digital Processing of Speech Signals , 1978 .
[4] M. Portnoff,et al. Time-scale modification of speech based on short-time Fourier analysis , 1981 .