Re-evaluation of LVQ-HMM hybrid algorithm

The LVQ-HMM hybrid algorithm was one of the first algorithms proposed in a recent approach aiming to integrate a highly discriminative artificial neural network-based classifier with an HMM capable of representing temporal structure effectively. The high phoneme classification capability of LVQ-HMM has already been demonstrated. However, the performance of LVQ-HMM has been less striking in more difficult, large scale speech recognition situations, making evaluation of the algorithm controversial and suggesting a more detailed investigation of the properties of the algorithm in such situations. This technical report is thus devoted to re-evaluation of the hybrid algo rithm, evaluated for word and phrase recognition tasks. Specifically, recognition ex perimentsare conducted under rather difficult, speaker-independent and large-vocabu lary conditions. Our recognizer uses a phoneme-based strategy; in particular, the predictive LR-parser is incorporated for efficient recognition. Experimental results alone are unfortunately insufficient to cease the controversy. However, possible contribu tions and aspects of the algorithm needing further improvement are brought to light.

[1]  Shigeru Katagiri,et al.  A new HMM/LVQ hybrid algorithm for speech recognition , 1990, [Proceedings] GLOBECOM '90: IEEE Global Telecommunications Conference and Exhibition.

[2]  Shigeru Katagiri,et al.  ATR Japanese speech database as a tool of speech recognition and synthesis , 1990, Speech Commun..

[3]  Kenji Kita,et al.  ATR HMM-LR continuous speech recognition system , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[4]  Shigeru Katagiri,et al.  Shift-invariant, multi-category phoneme recognition using Kohonen's LVQ2 , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[5]  Shigeru Katagiri,et al.  A generalized probabilistic descent method , 1990 .

[6]  John Makhoul,et al.  Discriminant analysis and supervised vector quantization for continuous speech recognition , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[7]  T. Kohonen,et al.  Statistical pattern recognition with neural networks: benchmarking studies , 1988, IEEE 1988 International Conference on Neural Networks.

[8]  E. Mcdermott,et al.  LVQ3 for phoneme recognition , 1990 .

[9]  Kenji Kita,et al.  HMM continuous speech recognition using predictive LR parsing , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[10]  Yuqing Gao,et al.  HMM-based warping in neural networks , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[11]  E. McDermott,et al.  A hybrid speech recognition system using HMMs with an LVQ-trained codebook , 1990 .

[12]  Geoffrey E. Hinton,et al.  Phoneme recognition using time-delay neural networks , 1989, IEEE Trans. Acoust. Speech Signal Process..

[13]  M. A. Bush,et al.  Speaker-independent vowel classification using hidden Markov models and LVQ2 , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[14]  Biing-Hwang Juang,et al.  Discriminative learning for minimum error classification [pattern recognition] , 1992, IEEE Trans. Signal Process..

[15]  Edward A. Lee,et al.  Fuzzy vector quantazation applied to hidden Markov modeling , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.