论文信息 - A mixture linear model with target-directed dynamics for spontaneous speech recognition

A mixture linear model with target-directed dynamics for spontaneous speech recognition

In this paper, a mixture linear dynamic model (MLDM) for speech recognition is developed and evaluated, where several linear dynamic systems are combined (mixed) to represent different vocaltract-resonance (VTR) dynamic behavior and the mapping relationships between the VTRs and the acoustic observation. Each linear dynamic model is formulated as a stale-space system, where the VTR's target-directed dynamic property is incorporated in the state equation and a linear regression function is used for the observation equation to piecewise linearly approximate the nonlinear mapping relationship. A version of the generalized EM algorithm is developed for learning the model parameters, where the VTR targets are constrained to change only at the segmental level (rather than at the frame level) in the parameter learning and model scoring algorithms. Speech recognition experiments are carried out to evaluate this new model using the N-best re-scoring paradigm in a Switchboard task. Compared with a baseline recognizer using the triphone HMM acoustic model, the new recognizer demonstrates superior performance under a number of experimental conditions.

Li Deng | Jeff Z. Ma | L. Deng

[1] R. Shumway,et al. AN APPROACH TO TIME SERIES SMOOTHING AND FORECASTING USING THE EM ALGORITHM , 1982 .

[2] Li Deng,et al. A stochastic model of speech incorporating hierarchical nonstationarity , 1993, IEEE Trans. Speech Audio Process..

[3] Mari Ostendorf,et al. From HMM's to segment models: a unified view of stochastic modeling for speech recognition , 1996, IEEE Trans. Speech Audio Process..

[4] J. S. Bridle,et al. An investigation of segmental hidden dynamic models of speech coarticulation for automatic speech recognition , 1998 .

[5] Li Deng,et al. Optimization of dynamic regimes in a statistical hidden dynamic model for conversational speech recognition , 1999, EUROSPEECH.

[6] Li Deng,et al. Computational Models for Speech Production , 2018, Speech Processing.

[7] L Deng,et al. Spontaneous speech recognition using a statistical coarticulatory model for the vocal-tract-resonance dynamics. , 2000, The Journal of the Acoustical Society of America.