Unit fusion for concatenative speech synthesis

An important problem in concatenative synthesis is the occurence of spectral discontinuities or “concatenation mismatch” between sonorant speech units. In this paper, we present an approach to reduce concatenation mismatch by combining spectral information from two sequences of speech units selected in parallel. Concatenation units, on one hand, define initial spectral trajectories for a target utterance. Fusion units, on the other hand, define the desired transitions between concatenated units. The two unit sequences are “fused” by imposing dynamic constraints defined by the fusion units on the spectral trajectories of the concatenation units. To regenerate the modified speech units, we use a synthesis algorithm based on sinusoidal + all-pole analysis of speech, which overcomes the limitations of residual-excited LPC. Results from a perceptual test show that our method is highly successful at removing concatenation artifacts in speech generated from an inventory of diphones.