论文信息 - Computational Models for Speech Production

Computational Models for Speech Production

Major speech production models from speech science literature and a number of popular statistical “generative” models of speech used in speech technology are surveyed. Strengths and weaknesses of these two styles of speech models are analyzed, pointing to the need to integrate the respective strengths while eliminating the respective weaknesses. As an example, a statistical task-dynamic model of speech production is described, motivated by the original deterministic version of the model and targeted for integrated-multilingual speech recognition applications. Methods for model parameter learning (training) and for likelihood computation (recognition) are described based on statistical optimization principles integrated in neural network and dynamic system theories.

Li Deng | L. Deng

[1] Kenneth N. Stevens,et al. On the quantal nature of speech , 1972 .

[2] Chin-W. Kim,et al. Models of Speech Production , 1972, Formal Aspects of Cognitive Processes.

[3] Raymond D. Kent,et al. chapter 3 – Models of Speech Production , 1976 .

[4] G. Kitagawa,et al. Smoothness Priors in Time Series. , 1987 .

[5] L Saltzman Elliot,et al. A Dynamical Approach to Gestural Patterning in Speech Production , 1989 .

[6] Li Deng,et al. A generalized hidden Markov model with state-conditioned trend functions of time for the speech signal , 1992, Signal Process..

[7] Oded Ghitza,et al. Hidden Markov models with templates as non-stationary states: an application to speech recognition , 1993, Comput. Speech Lang..

[8] Herbert Gish,et al. A segmental speech model with applications to word spotting , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9] J. R. Rohlicek,et al. ML estimation of a stochastic linear system with the EM algorithm and its application to speech recognition , 1993, IEEE Trans. Speech Audio Process..

[10] L. Deng. Design of a feature‐based speech recognizer aiming at integration of auditory processing, signal modeling, and phonological structure of speech , 1993 .

[11] Li Deng,et al. Speech recognition using the atomic speech units constructed from overlapping articulatory features , 1994, EUROSPEECH.