DFW-based spectral smoothing for concatenative speech synthesis

This paper proposes and evaluates a new spectral smoothing technique whose performance is comparable with LSP interpolation in terms of Euclidean spectral distance measurements but whose interpolated formant trajectories are more reasonable from a phonetic point of view. The approach firstly estimates derivative logarithmic magnitude spectra from both the source and the target frame represented by autoregressive filter coefficients. Then, Dynamic Programming yields the best alignment between these two spectral representations. Smoothed frequency responses are achieved by weighted linear interpolation between the corresponding source and target spectral lines whose alignment was found by DP backtracking. Finally, the spectrum is converted to autoregressive filter coefficients with the intermediate stage of autocorrelation coefficients.

[1]  J. Makhoul,et al.  Linear prediction: A tutorial review , 1975, Proceedings of the IEEE.

[2]  Stephen Isard,et al.  Optimal coupling of diphones , 1994, SSW.

[3]  Levent M. Arslan,et al.  Voice conversion by codebook mapping of line spectral frequencies and excitation spectrum , 1997, EUROSPEECH.

[4]  D.G. Childers,et al.  Measuring and modeling vocal source-tract interaction , 1994, IEEE Transactions on Biomedical Engineering.

[5]  John E. Markel,et al.  Linear Prediction of Speech , 1976, Communication and Cybernetics.

[6]  Bishnu S. Atal,et al.  Speech synthesis by linear interpolation of spectral parameters between dyad boundaries , 1979 .

[7]  Yoshinori Sagisaka,et al.  Speech spectrum conversion based on speaker interpolation and multi-functional representation with weighting by radial basis function networks , 1995, Speech Commun..

[8]  Piet M. T. Broersen,et al.  LPC interpolation by approximation of the sample autocorrelation function , 1998, IEEE Trans. Speech Audio Process..

[9]  Kuldip K. Paliwal,et al.  Interpolation properties of linear prediction parametric representations , 1995, EUROSPEECH.

[10]  Janet Slifka,et al.  Speaker modification with LPC pole analysis , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[11]  Vladimir Goncharoff,et al.  Interpolation of LPC spectra via pole shifting , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[12]  John H. L. Hansen,et al.  A comparison of spectral smoothing methods for segment concatenation based speech synthesis , 2002, Speech Commun..

[13]  Malcolm Slaney,et al.  Automatic audio morphing , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.