Improving Speech Recognition Performance via Phone-Dependent VQ Codebooks and Adaptive Language Models in SPHINX-I1 M. Hwang R. Rosenfeld E. Thayer R. Mosur L. Chase R. Weide

This 'paper presents improvements in acoustic and language modeling for automatic speech recognition. Specifically, semi-continuous HMMs (SCHMMs) with phonedependent VQ codehooks are presented and incorporated into the SPHINX-IIspeech recognition system. The phonedependent VQ codebooks relax the density-tying constraint in SCHMMs in order to obtain more detailed models. A 6%' error rate reduction was achieved on the speakerindependent 20,000-word Wall Street Journal (WSJ) task. Dynamic adaptation of the language model in the context of long documents is also explored. A maximum entropy framework is used to exploit long distance trigrams and trigger effects. A 10% -- 15% word error rate reduction is reported on the same WSJ task using the adaptive language modeling technique.

[1]  Mei-Yuh Hwang,et al.  Predicting unseen triphones with senones , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Mei-Yuh Hwang,et al.  Subphonetic modeling with Markov states-Senone , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  Mei-Yuh Hwang,et al.  An improved search algorithm using incremental knowledge for continuous speech recognition , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  Solomon Kullback,et al.  Information Theory and Statistics , 1960 .