Cooccurrence smoothing for stochastic language modeling
暂无分享,去创建一个
Training corpora for stochastic language models are virtually always too small for maximum-likelihood estimation, so smoothing the models is of great importance. The authors derive the cooccurrence smoothing technique for stochastic language modeling and give experimental evidence for its validity. Using word-bigram language models, cooccurrence smoothing improved the test-set perplexity by 14% on a German 100000-word text corpus and by 10% on an English 1-million word corpus.<<ETX>>
[1] Masafumi Nishimura,et al. Isolated word recognition using hidden Markov models , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.
[2] Chris Barry,et al. Robust smoothing methods for discrete hidden Markov models , 1989, International Conference on Acoustics, Speech, and Signal Processing,.
[3] Hermann Ney,et al. On smoothing techniques for bigram-based natural language modelling , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.