A Generalized Language Model as the Combination of Skipped n-grams and Modified Kneser Ney Smoothing

We introduce a novel approach for building language models based on a systematic, recursive exploration of skip n-gram models which are interpolated using modified Kneser-Ney smoothing. Our approach generalizes language models as it contains the classical interpolation with lower order models as a special case. In this paper we motivate, formalize and present our approach. In an extensive empirical experiment over English text corpora we demonstrate that our generalized language models lead to a substantial reduction of perplexity between 3.1% and 12.7% in comparison to traditional language models using modified Kneser-Ney smoothing. Furthermore, we investigate the behaviour over three other languages and a domain specific corpus where we observed consistent improvements. Finally, we also show that the strength of our approach lies in its ability to cope in particular with sparse training data. Using a very small training data set of only 736 KB text we yield improvements of even 25.7% reduction of perplexity.

[1]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[2]  Tomaz Erjavec,et al.  The JRC-Acquis: A Multilingual Aligned Parallel Corpus with 20+ Languages , 2006, LREC.

[3]  Philipp Koehn,et al.  Scalable Modified Kneser-Ney Language Model Estimation , 2013, ACL.

[4]  F ChenStanley,et al.  An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[5]  Frederick Jelinek,et al.  Interpolated estimation of Markov source parameters from sparse data , 1980 .

[6]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[7]  Joshua Goodman,et al.  A bit of progress in language modeling , 2001, Comput. Speech Lang..

[8]  John Cocke,et al.  A Statistical Approach to Machine Translation , 1990, CL.

[9]  Robert L. Mercer,et al.  Context based spelling correction , 1991, Inf. Process. Manag..

[10]  Sergei Nirenburg,et al.  A Statistical Approach to Machine Translation , 2003 .

[11]  Peter Haider,et al.  Predicting Sentences using N-Gram Language Models , 2005, HLT.

[12]  Yorick Wilks,et al.  A Closer Look at Skip-gram Modelling , 2006, LREC.

[13]  Hermann Ney,et al.  On structuring probabilistic dependences in stochastic language modelling , 1994, Comput. Speech Lang..

[14]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[15]  L MercerRobert,et al.  Class-based n-gram models of natural language , 1992 .

[16]  Heinrich Niemann,et al.  Permugram language models , 1995, EUROSPEECH.

[17]  Jianfeng Gao,et al.  Long Distance Dependency in Language Modeling: An Empirical Study , 2004, IJCNLP.

[18]  Joshua Goodman,et al.  Putting it all together: language model combination , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[19]  I. Good THE POPULATION FREQUENCIES OF SPECIES AND THE ESTIMATION OF POPULATION PARAMETERS , 1953 .

[20]  Mei-Yuh Hwang,et al.  The SPHINX-II speech recognition system: an overview , 1993, Comput. Speech Lang..

[21]  Slava M. Katz,et al.  Estimation of probabilities from sparse data for the language model component of a speech recognizer , 1987, IEEE Trans. Acoust. Speech Signal Process..

[22]  Hermann Ney,et al.  Improved backing-off for M-gram language modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.