论文信息 - A speech model of acoustic inventories based on asynchronous interpolation

A speech model of acoustic inventories based on asynchronous interpolation

We propose a speech model that describes acoustic inventories of concatenative synthesizers. The model has the following characteristics: (i) very compact representations and thus high compression ratios are possible, (ii) re-synthezised speech is free of concatenation errors, (iii) the degree of articulation can be controlled explicitly, and (iv) voice transformation is feasible with relatively few additional recordings of a target speaker. The model represents a speech unit as a synthesis of several types of features, each of which has been computed using non-linear, asynchronous interpolation of neighboring basis vectors associated with known phonemic identities. During analysis, basis vectors and transition weights are estimated under a strict diphone assumption using a dynamic time warping approach. During synthesis, the estimated transition weight values are modified to produce changes in duration and articulation effort.

Jan P. H. van Santen | Alexander Kain

[1] Frantz Clermont,et al. A methodology for modeling vowel formant contours in CVC context , 1987 .

[2] D H Klatt,et al. Review of text-to-speech conversion for English. , 1987, The Journal of the Acoustical Society of America.

[3] J.P.H. van Santen,et al. Compression of acoustic inventories using asynchronous interpolation , 2002, Proceedings of 2002 IEEE Workshop on Speech Synthesis, 2002..

[4] Michael W. Macon,et al. Control of spectral dynamics in concatenative speech synthesis , 2001, IEEE Trans. Speech Audio Process..

[5] Alexander Kain,et al. OGIresLPC: Diphone synthesizer using residual-excited linear prediction , 1997 .

[6] Bishnu S. Atal,et al. Efficient coding of LPC parameters by temporal decomposition , 1983, ICASSP.