What to Expect from Expected Kneser-Ney Smoothing

Kneser-Ney smoothing on expected counts was proposed recently in Zhang et al. 2014. In this paper we revisit this technique and suggest a number of optimizations and extensions. We then analyze its performance in several practical speech recognition scenarios that depend on fractional sample counts, such as training on uncertain data, language model adaptation and Word-Phrase-Entity models. We show that the proposed approach to smoothing outperforms known alternatives by a significant margin.

[1]  Hermann Ney,et al.  Improved backing-off for M-gram language modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[2]  Yifan Gong,et al.  Don't Count on ASR to Transcribe for You: Breaking Bias with Two Crowds , 2017, INTERSPEECH.

[3]  Brian Roark,et al.  Learning N-Gram Language Models from Uncertain Data , 2016, INTERSPEECH.

[4]  Brian Roark,et al.  MAP adaptation of stochastic grammars , 2006, Comput. Speech Lang..

[5]  Douglas D. O'Shaughnessy,et al.  Topic n-gram count language model adaptation for speech recognition , 2012, 2012 IEEE Spoken Language Technology Workshop (SLT).

[6]  F ChenStanley,et al.  An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[7]  Giuseppe Riccardi,et al.  On-line learning of language models with word error probability distributions , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[8]  Michael Levit,et al.  N-gram Smoothing on Expected Fractional Counts , 2018 .

[9]  Brian Roark,et al.  Unsupervised language model adaptation , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[10]  Hui Zhang,et al.  Kneser-Ney Smoothing on Expected Counts , 2014, ACL.

[11]  Ian H. Witten,et al.  The zero-frequency problem: Estimating the probabilities of novel events in adaptive text compression , 1991, IEEE Trans. Inf. Theory.

[12]  Andreas Stolcke,et al.  Token-level interpolation for class-based language models , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[13]  Michael Levit,et al.  Word-Phrase-Entity Recurrent Neural Networks for Language Modeling , 2016, INTERSPEECH.

[14]  Mark J. F. Gales,et al.  Context dependent language model adaptation , 2008, INTERSPEECH.

[15]  Andreas Stolcke,et al.  Word-phrase-entity language models: getting more mileage out of n-grams , 2014, INTERSPEECH.

[16]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[17]  Thorsten Brants,et al.  Study on interaction between entropy pruning and kneser-ney smoothing , 2010, INTERSPEECH.

[18]  Brian Roark,et al.  The OpenGrm open-source finite-state grammar software libraries , 2012, ACL.