A new hybrid algorithm for speech recognition based on HMM segmentation and learning vector quantization

A hybrid speech recognition algorithm based on the combination of hidden Markov models (HMMs) and learning vector quantization (LVQ) is presented. The LVQ training algorithms are capable of producing highly discriminative reference vectors for classifying static patterns, i.e., vectors with a fixed dimension. The HMM formulation has also been successfully applied to the recognition of dynamic speech patterns that are of variable duration. It is shown that by combining both LVQ's discriminative power and the HMM's capability of modeling temporal variations of speech in a hybrid algorithm, the performance of the original HMM-based speech recognizer is significantly improved. For a highly confusable vocabulary consisting of the nine American English E-set letters used in a multispeaker, isolated-word test mode, the average word accuracy of the baseline HMM recognizer is 67%. When LVQ is incorporated in the hybrid system, the word accuracy increases to 83%. >

[1]  Shun-ichi Amari,et al.  A Theory of Adaptive Pattern Classifiers , 1967, IEEE Trans. Electron. Comput..

[2]  Brian Everitt,et al.  Cluster analysis , 1974 .

[3]  Thomas W. Parsons,et al.  Voice and Speech Processing , 1986 .

[4]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[5]  Lalit R. Bahl,et al.  Maximum mutual information estimation of hidden Markov model parameters for speech recognition , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6]  Lawrence R. Rabiner,et al.  A segmental k-means training procedure for connected word recognition , 1986, AT&T Technical Journal.

[7]  Richard P. Lippmann,et al.  An introduction to computing with neural nets , 1987 .

[8]  R. Lippmann,et al.  An introduction to computing with neural nets , 1987, IEEE ASSP Magazine.

[9]  Yoh'ichi Tohkura,et al.  A weighted cepstral distance measure for speech recognition , 1987, IEEE Trans. Acoust. Speech Signal Process..

[10]  Lalit R. Bahl,et al.  A new algorithm for the estimation of hidden Markov model parameters , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[11]  T. Kohonen,et al.  Statistical pattern recognition with neural networks: benchmarking studies , 1988, IEEE 1988 International Conference on Neural Networks.

[12]  Alex Waibel,et al.  Phoneme recognition: neural networks vs. hidden Markov models vs. hidden Markov models , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[13]  Shigeru Katagiri,et al.  Shift-invariant, multi-category phoneme recognition using Kohonen's LVQ2 , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[14]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[15]  Ken-ichi Iso,et al.  Speaker-independent word recognition using dynamic programming neural networks , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[16]  Shigeru Katagiri,et al.  A new HMM/LVQ hybrid algorithm for speech recognition , 1990, [Proceedings] GLOBECOM '90: IEEE Global Telecommunications Conference and Exhibition.

[17]  Teuvo Kohonen,et al.  The self-organizing map , 1990 .

[18]  John Makhoul,et al.  Discriminant analysis and supervised vector quantization for continuous speech recognition , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[19]  Chin-Hui Lee,et al.  Acoustic modeling for large vocabulary speech recognition , 1990 .

[20]  H. Gish,et al.  A probabilistic approach to the understanding and training of neural network classifiers , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[21]  E. Mcdermott,et al.  LVQ3 for phoneme recognition , 1990 .

[22]  Bruce W. Suter,et al.  The multilayer perceptron as an approximation to a Bayes optimal discriminant function , 1990, IEEE Trans. Neural Networks.

[23]  Yuqing Gao,et al.  HMM-based warping in neural networks , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[24]  E. McDermott,et al.  A hybrid speech recognition system using HMMs with an LVQ-trained codebook , 1990 .

[25]  M. A. Bush,et al.  Speaker-independent vowel classification using hidden Markov models and LVQ2 , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[26]  Biing-Hwang Juang,et al.  Discriminative multi-layer feed-forward networks , 1991, Neural Networks for Signal Processing Proceedings of the 1991 IEEE Workshop.

[27]  Biing-Hwang Juang,et al.  New discriminative training algorithms based on the generalized probabilistic descent method , 1991, Neural Networks for Signal Processing Proceedings of the 1991 IEEE Workshop.

[28]  Shigeru Katagiri,et al.  A new connected word recognition algorithm based on HMM/LVQ segmentation and LVQ classification , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[29]  Chin-Hui Lee,et al.  Robustness and discrimination oriented speech recognition using weighted HMM and subspace projection approaches , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[30]  Biing-Hwang Juang,et al.  Discriminative learning for minimum error classification [pattern recognition] , 1992, IEEE Trans. Signal Process..

[31]  Biing-Hwang Juang,et al.  Discriminative analysis of distortion sequences in speech recognition , 1993, IEEE Trans. Speech Audio Process..