Comparison of part-of-speech and automatically derived category-based language models for speech recognition

This paper compares various category-based language models when used in conjunction with a word-based trigram by means of linear interpolation. Categories corresponding to parts-of-speech as well as automatically clustered groupings are considered. The category-based model employs variable-length n-grams and permits each word to belong to multiple categories. Relative word error rate reductions of between 2 and 7% over the baseline are achieved in N-best rescoring experiments on the Wall Street Journal corpus. The largest improvement is obtained with a model using automatically determined categories. Perplexities continue to decrease as the number of different categories is increased, but improvements in the word error rate reach an optimum.

[1]  Geoffrey Leech,et al.  The tagged LOB Corpus : user's manual , 1986 .

[2]  Slava M. Katz,et al.  Estimation of probabilities from sparse data for the language model component of a speech recognizer , 1987, IEEE Trans. Acoust. Speech Signal Process..

[3]  Janet M. Baker,et al.  The Design for the Wall Street Journal-based CSR Corpus , 1992, HLT.

[4]  Hermann Ney,et al.  Improved clustering techniques for class-based statistical language modelling , 1993, EUROSPEECH.

[5]  Ronald Rosenfeld,et al.  Adaptive Statistical Language Modeling; A Maximum Entropy Approach , 1994 .

[6]  Hermann Ney,et al.  On structuring probabilistic dependences in stochastic language modelling , 1994, Comput. Speech Lang..

[7]  Hermann Ney,et al.  Algorithms for bigram and trigram word clustering , 1995, Speech Commun..

[8]  Steve Young,et al.  Large vocabulary speech recognition , 1995 .

[9]  Joerg P. Ueberla,et al.  More efficient clustering of n-grams for statistical language modeling , 1995, EUROSPEECH.

[10]  P.C. Woodland,et al.  The 1994 HTK large vocabulary speech recognition system , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[11]  Thomas Niesler,et al.  A variable-length category-based n-gram language model , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[12]  Thomas Niesler,et al.  Combination of word-based and category-based language models , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[13]  Thomas Niesler,et al.  Category-Based Statistical Language Models , 1997 .

[14]  Ronald Rosenfeld,et al.  Statistical language modeling using the CMU-cambridge toolkit , 1997, EUROSPEECH.