Concatenative speech synthesis by minimum distortion criteria

A scheme is proposed for concatenative speech synthesis to improve the segment selection procedure by minimizing acoustic distortion between the selected segment and the desired spectrum for the target. The spectral prototypicality of a segment, the spectral difference between the source and target contexts, the degradation resulting from concatenation of phonemes, and the acoustic continuity between the concatenated segments are all considered as measures. A search method for selecting segments from a large speech database is also described. In this method, a three-step optimization is used for distortion minimization. A perceptual test shows that contextual spectral difference and acoustic continuity at the segment boundary are important measures for improving the quality of synthesized speech.<<ETX>>

[1]  Tomohisa Hirokawa,et al.  Segment selection and pitch modification for high quality speech synthesis using waveform segments , 1990, ICSLP.

[2]  Y. Sagisaka,et al.  Speech synthesis by rule using an optimal selection of non-uniform synthesis units , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[3]  S. Nakajima,et al.  Automatic generation of synthesis units based on context oriented clustering , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.