论文信息 - A New Spectral Smoothing Algorithm for Unit Concatenating Speech Synthesis

A New Spectral Smoothing Algorithm for Unit Concatenating Speech Synthesis

Speech unit concatenation with a large database is presently the most popular method for speech synthesis. In this approach, the mismatches at the unit boundaries are unavoidable and become one of the reasons for quality degradation. This paper proposes an algorithm to reduce undesired discontinuities between the subsequent units. Optimal matching points are calculated in two steps. Firstly, the Kullback-Leibler distance measurement is utilized for the spectral matching, then the unit sliding and the overlap windowing are used for the waveform matching. The proposed algorithm is implemented for the corpus-based unit concatenating Korean text-to-speech system that has an automatically labeled database. Experimental results show that our algorithm is fairly better than the raw concatenation or the overlap smoothing method.

Minsoo Hahn | Sang-Jin Kim | Hyun Bae Han | Kyung Ae Jang

[1] Raymond N. J. Veldhuis,et al. Reducing audible spectral discontinuities , 2001, IEEE Trans. Speech Audio Process..

[2] Raymond N. J. Veldhuis,et al. On the reduction of concatenation artefacts in diphone synthesis , 1998, ICSLP.

[3] Saeed Vaseghi,et al. Synthesis of unseen context and spectral and pitch contour smoothing in concatenated text to speech synthesis , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4] Julia Hirschberg,et al. Progress in speech synthesis , 1997 .

[5] Alan W. Black,et al. Unit selection in a concatenative speech synthesis system using a large speech database , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[6] Beat Pfister. High-quality prosodic modification of speech signals , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[7] John H. L. Hansen,et al. A comparison of spectral smoothing methods for segment concatenation based speech synthesis , 2002, Speech Commun..

[8] Stephen Isard,et al. Optimal coupling of diphones , 1994, SSW.