Multiple codebook semi-continuous hidden Markov models for speaker-independent continuous speech recognition

A semi-continuous hidden Markov model based on the multiple vector quantization codebooks is used here for large-vocabulary speaker-independent continuous speech recognition. In the techniques employed here, the semi-continuous output probability density function for each codebook is represented by a combination of the corresponding discrete output probabilities of the hidden Markov model and the continuous Gaussian density functions of each individual codebook. Parameters of the vector quantization codebook and the hidden Markov model are mutually optimized to achieve an optimal model/codebook combination under a unified probabilistic framework. Another advantage of this approach is the enhanced robustness of the semi-continuous output probability density function by the combination of multiple codewords and multiple codebooks. For a 1000-word speaker-independent continuous speech recognition using a word-pair grammar, the recognition error rate of the semi-continuous hidden Markov model was reduced by more than 29% and 40% in comparison to the discrete and continuous mixture hidden Markov model respectively. This research was sponsored in part by the Defense Advanced Research Projects Agency under Contract N00039-85-C-0163. The views and conclusions contained in this document are those of the author and should not be interpreted as representing the official policies, either expressed or implied, of the Defense Advanced Research Projects Agency, or the US Government. X.D. Huang is a holder of an Edinburgh University Studentship and ORS Awards. Visiting scientist from CSTR, University of Edinburgh, 80, South Bridge, Edinburgh EH1 1HN, Scodand

[1]  Chris Barry,et al.  Robust smoothing methods for discrete hidden Markov models , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[2]  B.-H. Juang,et al.  Maximum-likelihood estimation for mixture multivariate stochastic observations of Markov chains , 1985, AT&T Technical Journal.

[3]  Jerome R. Bellegarda,et al.  Tied mixture continuous parameter models for large vocabulary isolated speech recognition , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[4]  L. Baum,et al.  A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains , 1970 .

[5]  Frederick Jelinek,et al.  The development of an experimental discrete dictation recognizer , 1985 .

[6]  F. Jelinek,et al.  Continuous speech recognition by statistical methods , 1976, Proceedings of the IEEE.

[7]  A. Poritz,et al.  On hidden Markov models in isolated word recognition , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8]  Edward A. Lee,et al.  Fuzzy vector quantazation applied to hidden Markov modeling , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9]  George R. Doddington Phonetically sensitive discriminants for improved speech recognition , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[10]  R. Redner,et al.  Mixture densities, maximum likelihood, and the EM algorithm , 1984 .

[11]  Mei-Yuh Hwang,et al.  The SPHINX speech recognition system , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[12]  L. Rabiner,et al.  An introduction to hidden Markov models , 1986, IEEE ASSP Magazine.

[13]  John Makhoul,et al.  BYBLOS: The BBN continuous speech recognition system , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[14]  L. Baum,et al.  An inequality and associated maximization technique in statistical estimation of probabilistic functions of a Markov process , 1972 .

[15]  Frank K. Soong,et al.  High performance connected digit recognition, using hidden Markov models , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[16]  Xuedong Huang,et al.  Semi-continuous hidden Markov models for speech recognition , 1989 .

[17]  Masafumi Nishimura,et al.  HMM-Based speech recognition using multi-dimensional multi-labeling , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[18]  Raj Reddy,et al.  Large-vocabulary speaker-independent continuous speech recognition: the sphinx system , 1988 .

[19]  Vishwa Gupta,et al.  Integration of acoustic information in a large vocabulary word recognizer , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[20]  Xuedong Huang,et al.  Unified techniques for vector quantization and hidden Markov modeling using semi-continuous models , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[21]  Douglas B. Paul The Lincoln Continuous Speech Recognition System: Recent Developments and Results , 1989, HLT.

[22]  J. Makhoul,et al.  Vector quantization in speech coding , 1985, Proceedings of the IEEE.

[23]  Peter F. Brown,et al.  The acoustic-modeling problem in automatic speech recognition , 1987 .

[24]  Frederick Jelinek,et al.  Interpolated estimation of Markov source parameters from sparse data , 1980 .

[25]  M. Jack,et al.  Hidden Markov modelling of speech based on a semicontinuous model , 1988 .

[26]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[27]  L. R. Rabiner,et al.  Recognition of isolated digits using hidden Markov models with continuous mixture densities , 1985, AT&T Technical Journal.