Speech recognition using a strong correlation assumption for the instantaneous spectra

The conventional independence assumption made for the evolving speech spectra is replaced by a strong correlation assumption, which then leads to a new stochastic model. This model implements a nonlinear interpolation between the lower and upper bounds of the joint probability distributions. The advantage of the new model over other correlation-based modelling approaches is that it has a low parameter complexity, the same as that in models based on the independence-assumption. Experiments on a speaker-independent E-set database show the effectiveness of this new modelling approach.

[1]  Martin Russell,et al.  A segmental HMM for speech pattern modelling , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Patrick Kenny,et al.  A linear predictive HMM for vector-valued observations with applications to speech recognition , 1990, IEEE Trans. Acoust. Speech Signal Process..

[3]  George Zavaliagkos,et al.  A hybrid segmental neural net/hidden Markov model system for continuous speech recognition , 1994, IEEE Trans. Speech Audio Process..

[4]  Mari Ostendorf,et al.  A stochastic segment model for phoneme-based continuous speech recognition , 1989, IEEE Trans. Acoust. Speech Signal Process..

[5]  Kuldip K. Paliwal,et al.  Use of temporal correlation between successive frames in a hidden Markov model based speech recognizer , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6]  Oded Ghitza,et al.  Hidden Markov models with templates as non-stationary states: an application to speech recognition , 1993, Comput. Speech Lang..

[7]  Mari Ostendorf,et al.  A Dynamical System Approach to Continuous Speech Recognition , 1991, HLT.

[8]  Francis Jack Smith,et al.  A hidden Markov model with optimized inter-frame dependence , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.