论文信息 - An automatic speech recognition system using neural networks and linear dynamic models to recover and model articulatory traces

An automatic speech recognition system using neural networks and linear dynamic models to recover and model articulatory traces

We describe a speech recognition system which uses articulatory parameters as basic features and phone-dependent linear dynamic models. The system first estimates articulatory trajectories from the speech signal. Estimations of x and y coordinates of 7 actual articulator positions in the midsagittal plane are produced every 2 milliseconds by a recurrent neural network, trained on real articulatory data. The output of this network is then passed to a set of linear dynamic models, which perform phone recognition.

[1] Mari Ostendorf,et al. From HMM's to segment models: a unified view of stochastic modeling for speech recognition , 1996, IEEE Trans. Speech Audio Process..

[2] G Papcun,et al. Inferring articulation and recognizing gestures from acoustics with a neural network trained on x-ray microbeam data. , 1992, The Journal of the Acoustical Society of America.

[3] V. Gracco,et al. Accurate recovery of articulator positions from acoustics: new conclusions based on human data. , 1996, The Journal of the Acoustical Society of America.

[4] Simon King,et al. Dynamical system modelling of articulator movement. , 1999 .

[5] Korin Richmond. Estimating velum height from acoustics during continuous speech , 1999, EUROSPEECH.