Assessment of smoothing methods and complex stochastic language modeling

This paper studies the overall effect of language modeling on perplexity and word error rate, starting from a trigram model with a standard smoothing method up to complex state–of–the– art language models: (1) We compare different smoothing methods, namely linear vs. absolute discounting, interpolation vs. backing-off, and back-off functions based on relative frequencies vs. singleton events. (2) We show the effect of complex language model techniques by using distant-trigrams and automatically selected word classes and word phrases using a maximum likelihood criterion (i.e. minimum perplexity). (3) We show the overall gain of the combined application of the above techniques, as opposed to their separate assessment in past publications. (4) We give perplexity and word error rate results on the North American Business corpus (NAB) with a training text of about 240 million words and on the German Verbmobil corpus.

[1]  E. L. Lehmann,et al.  Theory of point estimation , 1950 .

[2]  Lalit R. Bahl,et al.  A Maximum Likelihood Approach to Continuous Speech Recognition , 1983, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Frederick Jelinek,et al.  Self-organizing language modeling for speech recognition , 1990 .

[4]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[5]  Hermann Ney,et al.  Improved clustering techniques for class-based statistical language modelling , 1993, EUROSPEECH.

[6]  Ronald Rosenfeld,et al.  Adaptive Statistical Language Modeling; A Maximum Entropy Approach , 1994 .

[7]  Hermann Ney,et al.  On structuring probabilistic dependences in stochastic language modelling , 1994, Comput. Speech Lang..

[8]  Hermann Ney,et al.  Algorithms for bigram and trigram word clustering , 1995, Speech Commun..

[9]  Hermann Ney,et al.  Improved backing-off for M-gram language modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[10]  Michèle Jardino Multilingual stochastic n-gram class language models , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[11]  Sven C. Martin,et al.  Statistical Language Modeling Using Leaving-One-Out , 1997 .

[12]  J. Cleary,et al.  \self-organized Language Modeling for Speech Recognition". In , 2022 .

[13]  Hermann Ney,et al.  A word graph algorithm for large vocabulary continuous speech recognition , 1994, Comput. Speech Lang..

[14]  Dietrich Klakow Language-model optimization by mapping of corpora , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).