Acoustic-To-Articulatory Inversion Using Dynamical and Phonological Constraints

A well-known difficulty in using the articulatory represent ation for applications in the areas of speech coding, synthesis an d recognition is the poor accuracy in the estimation of the art iculatory parameters from the acoustic signal of speech. The diffi culty is especially serious for most classes of consonantal sound . This paper presents a statistical method of estimating the artic ula ory trajectories from the speech signal based on training datab ases of articulatory-acoustic parameters obtained from continuo us speech utterances. The estimation of articulatory trajectories u ses the extended Kalman filtering (EKF) technique and is based on new li nguistic constraints imposed to acoustic-to-articulatory inversion. These new constraints are mainly implemented by dividing th e whole articulatory-acoustic function into a number of phon ol gical sub-functions, each corresponding to a unit of speech de fined as the patterns of the continuous transition between two con secutive phonemes. The articulatory-acoustic sub-function i s a part of the state-space model that represents each phonological unit of speech. A method of segmenting the speech signal and recognizing the phonological units was developed based on like lihood computation from Kalman filtering with different model s. The final estimation of articulatory trajectories is obtain ed from Kalman smoother using the parameters of the recognized mode ls. Estimation results compared to articulographic and X-ray s peech data are presented in this paper. Average RMS errors of about 2 mm have been obtained between estimated and actual articula tory trajectories.