Recovering vocal tract shapes from MFCC parameters

Recovering vocal tract shapes from the speech signal is a well known inversion problem of transformation from the articulatory system to speech acoustics. Most of the studies on this problem in the past have been focused on vowels. There have not been general methods e ective for recovering the vocal tract shapes from the speech signal for all classes of speech sounds. In this paper we describe our attempt towards speech inverse mapping by using the mel-frequency cepstrum coe cients to represent the acoustic parameters of the speech signal. An inversion method is developed based on Kalman ltering and a dynamic-system model describing the articulatory motion. This method uses an articulatory-acoustic codebook derived from Maeda's articulatory model.

[1]  Li Deng,et al.  Maximum-likelihood estimation for articulatory speech recognition using a stochastic target model , 1995, EUROSPEECH.

[2]  J. Schroeter,et al.  Speech parameter estimation using a vocal tract/Cord model , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  Victor N. Sorokin,et al.  Determination of vocal tract shape for vowels , 1992, Speech Commun..

[4]  Mari Ostendorf,et al.  ML estimation of a stochastic linear system with the EM algorithm and its application to speech recognition , 1993, IEEE Trans. Speech Audio Process..

[5]  Li Deng,et al.  Estimation of Articulatory Parameters from Speech Acoustics by Kalman Filtering , 1998 .

[6]  V. Gracco,et al.  Accurate recovery of articulator positions from acoustics: new conclusions based on human data. , 1996, The Journal of the Acoustical Society of America.

[7]  M. Sondhi,et al.  Determination of vocal-tract shape from impulse response at the lips. , 1971, The Journal of the Acoustical Society of America.

[8]  Waveforms Hisashi Wakita Direct Estimation of the Vocal Tract Shape by Inverse Filtering of Acoustic Speech , 1973 .

[9]  M. Schroeder,et al.  Determination of Smoothed Cross‐Sectional‐Area Functions of the Vocal Tract from Formant Frequencies , 1965 .

[10]  Katsuhiko Shirai,et al.  ARTICULATORY MODEL AND THE ESTIMATION OF ARTICULATORY PARAMETERS BY NONLINEAR REGRESSION METHOD. , 1976 .

[11]  B. Atal,et al.  Inversion of articulatory-to-acoustic transformation in the vocal tract by a computer-sorting technique. , 1978, The Journal of the Acoustical Society of America.

[12]  Man Mohan Sondhi,et al.  Dynamic programming search of articulatory codebooks , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[13]  G Papcun,et al.  Inferring articulation and recognizing gestures from acoustics with a neural network trained on x-ray microbeam data. , 1992, The Journal of the Acoustical Society of America.

[14]  John S. D. Mason,et al.  Deriving articulatory representations of speech , 1995, EUROSPEECH.