Large-vocabulary speaker-independent continuous speech recognition with semi-continuous hidden Markov models

A semi-continuous hidden Markov model based on the multiple vector quantization codebooks is used here for large-vocabulary speaker-independent continuous speech recognition In the techniques employed here, the semi-continuous output probability density function for each codebook is represented by a combination of the corresponding discrete output probabilities of the hidden Markov model and the continuous Gaussian density functions of each individual codebook. Parameters of vector quantization codebook and hidden Markov model are mutually optimized to achieve an optimal model codebook combination under a unified probabilistic framework Another advantages of this approach is the enhanced robustness of the semi-continuous output probability by the combination of multiple codewords and multiple codebooks For a 1000-word speaker-independent continuous speech recognition using a word-pair grammar the recognition error rate of the semi-continuous hidden Markov model was reduced by more than 29% and 41% in comparison to the discrete and continuous mixture hidden Markov model respectively

[1]  Xuedong Huang,et al.  Semi-continuous hidden Markov models for speech recognition , 1989 .

[2]  Jerome R. Bellegarda,et al.  Tied mixture continuous parameter models for large vocabulary isolated speech recognition , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[3]  Raj Reddy,et al.  Large-vocabulary speaker-independent continuous speech recognition: the sphinx system , 1988 .

[4]  Vishwa Gupta,et al.  Integration of acoustic information in a large vocabulary word recognizer , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5]  Mei-Yuh Hwang,et al.  The SPHINX speech recognition system , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[6]  B.-H. Juang,et al.  Maximum-likelihood estimation for mixture multivariate stochastic observations of Markov chains , 1985, AT&T Technical Journal.

[7]  R. Redner,et al.  Mixture densities, maximum likelihood, and the EM algorithm , 1984 .

[8]  L. R. Rabiner,et al.  Recognition of isolated digits using hidden Markov models with continuous mixture densities , 1985, AT&T Technical Journal.

[9]  Frank K. Soong,et al.  High performance connected digit recognition using hidden Markov models , 1989, IEEE Trans. Acoust. Speech Signal Process..

[10]  Douglas B. Paul The Lincoln Continuous Speech Recognition System: Recent Developments and Results , 1989, HLT.

[11]  M. Jack,et al.  Hidden Markov modelling of speech based on a semicontinuous model , 1988 .

[12]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[13]  George R. Doddington Phonetically sensitive discriminants for improved speech recognition , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[14]  L. Baum,et al.  A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains , 1970 .

[15]  F. Jelinek,et al.  Continuous speech recognition by statistical methods , 1976, Proceedings of the IEEE.

[16]  Hsiao-Wuen Hon,et al.  Multiple codebook semi-continuous hidden Markov models for speaker-independent continuous speech recognition , 1989 .

[17]  Frederick Jelinek,et al.  Interpolated estimation of Markov source parameters from sparse data , 1980 .