论文信息 - Unit fusion for concatenative speech synthesis

Unit fusion for concatenative speech synthesis

An important problem in concatenative synthesis is the occurence of spectral discontinuities or “concatenation mismatch” between sonorant speech units. In this paper, we present an approach to reduce concatenation mismatch by combining spectral information from two sequences of speech units selected in parallel. Concatenation units, on one hand, define initial spectral trajectories for a target utterance. Fusion units, on the other hand, define the desired transitions between concatenated units. The two unit sequences are “fused” by imposing dynamic constraints defined by the fusion units on the spectral trajectories of the concatenation units. To regenerate the modified speech units, we use a synthesis algorithm based on sinusoidal + all-pole analysis of speech, which overcomes the limitations of residual-excited LPC. Results from a perceptual test show that our method is highly successful at removing concatenation artifacts in speech generated from an inventory of diphones.

Michael W. Macon | Johan Wouters

[1] METHODS FOR SUBJECTIVE DETERMINATION OF TRANSMISSION QUALITY Summary , 2022 .

[2] Alex Acero,et al. HMM-based smoothing for concatenative speech synthesis , 1998, ICSLP.

[3] Robert E. Donovan,et al. The IBM trainable speech synthesis system , 1998, ICSLP.

[4] Raymond N. J. Veldhuis,et al. On the reduction of concatenation artefacts in diphone synthesis , 1998, ICSLP.

[5] Alan W. Black,et al. Unit selection in a concatenative speech synthesis system using a large speech database , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[6] Michael W. Macon,et al. Spectral modification for concatenative speech synthesis , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[7] Michael W. Macon,et al. A perceptual evaluation of distance measures for concatenative speech synthesis , 1998, ICSLP.

[8] Peter Jackson,et al. Non-uniform unit selection and the similarity metric within BT's Laureate TTS system , 1998, SSW.

[9] Ann K. Syrdal,et al. Diphone synthesis using unit selection , 1998, SSW.