The development of the 1994 HTK large vocabulary speech recognition system

This paper describes recent developments of the HTK large vocabulary continuous speech recognition system. The system uses tied-state crossword context-dependent mixture Gaussian HMMs and a dynamic network decoder that can operate in a single pass. In the last year the decoder has been extended to produce word lattices to allow exible and eecient system development, as well as multi-pass operation for use with computationally expensive acoustic and/or language models. The system vocabulary can now be up to 65k words, the nal acoustic models have been extended to be sensitive to more acoustic context (quinphones), a 4-gram language model has been used and unsupervised incremental speaker adaptation incorporated. The resulting system gave the lowest error rates on both the H1-P0 and H1-C1 hub tasks in the November 1994 ARPA CSR evaluation.

[1]  Slava M. Katz,et al.  Estimation of probabilities from sparse data for the language model component of a speech recognizer , 1987, IEEE Trans. Acoust. Speech Signal Process..

[2]  Hermann Ney,et al.  Large vocabulary continuous speech recognition of Wall Street Journal data , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[3]  Steve J. Young,et al.  A One Pass Decoder Design For Large Vocabulary Recognition , 1994, HLT.

[4]  S. J. Young,et al.  Tree-based state tying for high accuracy acoustic modelling , 1994 .

[5]  Steve J. Young,et al.  Large vocabulary continuous speech recognition using HTK , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[6]  Philip C. Woodland,et al.  Speaker adaptation of continuous density HMMs using multivariate linear regression , 1994, ICSLP.

[7]  P. Woodland,et al.  Flexible speaker adaptation using maximum likelihood linear regression , 1995 .