Methods for Integrating Phonetic and Phonological Knowledge in Speech Inversion

Exploiting the information about the vocal tract shape that produced the speech has been appealing to speech researchers and scientists for a long period of time. Experimental studies that included the articulatory information from physiological measurements supported the idea that this information could be useful in a number of areas of speech science and technology. However, the estimation of articulatory trajectories from the speech acoustic signal is known as a difficult and ill-posed problem. Among the relatively recent methods proposed to solve this problem are those trying to integrate linguistic knowledge in the form of phonetic or phonological constraints. This paper reviews and discusses upon methods for applying phonetic and phonological constraints to provide unique solutions to the acoustic-to-articulatory inversion. Key-Words: speech inversion, acoustic-to-articulatory mapping, vocal-tract estimation, phonetics, phonology

[1]  M. Schroeder,et al.  Determination of Smoothed Cross‐Sectional‐Area Functions of the Vocal Tract from Formant Frequencies , 1965 .

[2]  J. Flanagan Speech Analysis, Synthesis and Perception , 1971 .

[3]  Waveforms Hisashi Wakita Direct Estimation of the Vocal Tract Shape by Inverse Filtering of Acoustic Speech , 1973 .

[4]  James Lubker,et al.  Formant frequencies of some fixed-mandible vowels and a model of speech motor programming by predict , 1977 .

[5]  B. Atal,et al.  Inversion of articulatory-to-acoustic transformation in the vocal tract by a computer-sorting technique. , 1978, The Journal of the Acoustical Society of America.

[6]  Katsuhiko Shirai,et al.  Considerations on articulatory dynamics for continuous speech recognition , 1983, ICASSP.

[7]  Katsuhiko Shirai,et al.  Estimating articulatory motion from speech wave , 1986, Speech Commun..

[8]  W. Bastiaan Kleijn,et al.  Acoustic to articulatory parameter mapping using an assembly of neural networks , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[9]  G Papcun,et al.  Inferring articulation and recognizing gestures from acoustics with a neural network trained on x-ray microbeam data. , 1992, The Journal of the Acoustical Society of America.

[10]  Man Mohan Sondhi,et al.  Techniques for estimating vocal-tract shapes from the speech signal , 1994, IEEE Trans. Speech Audio Process..

[11]  Li Deng,et al.  Maximum-likelihood estimation for articulatory speech recognition using a stochastic target model , 1995, EUROSPEECH.

[12]  John S. D. Mason,et al.  Deriving articulatory representations from speech with various excitation modes , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[13]  Li Deng,et al.  Optimal filtering and smoothing for speech recognition using a stochastic target model , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[14]  Simon King,et al.  Dynamical system modelling of articulator movement. , 1999 .