论文信息 - Cooccurrence smoothing for stochastic language modeling

Cooccurrence smoothing for stochastic language modeling

Training corpora for stochastic language models are virtually always too small for maximum-likelihood estimation, so smoothing the models is of great importance. The authors derive the cooccurrence smoothing technique for stochastic language modeling and give experimental evidence for its validity. Using word-bigram language models, cooccurrence smoothing improved the test-set perplexity by 14% on a German 100000-word text corpus and by 10% on an English 1-million word corpus.<<ETX>>

Volker Steinbiss | Ute Essen | Volker Steinbiss | U. Essen | V. Steinbiss

[1] Masafumi Nishimura,et al. Isolated word recognition using hidden Markov models , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2] Chris Barry,et al. Robust smoothing methods for discrete hidden Markov models , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[3] Hermann Ney,et al. On smoothing techniques for bigram-based natural language modelling , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.