Iterative language model estimation: efficient data structure & algorithms

Despite the availability of better performing techniques, most language models are trained using popular toolkits that do not support perplexity optimization. In this work, we present an efficient data structure and optimized algorithms specifically designed for iterative parameter tuning. With the resulting implementation, we demonstrate the feasibility and effectiveness of such iterative techniques in language model estimation.

[1]  Ronald Rosenfeld,et al.  Statistical language modeling using the CMU-cambridge toolkit , 1997, EUROSPEECH.

[2]  Mauro Cettolo,et al.  Efficient Handling of N-gram Language Models for Statistical Machine Translation , 2007, WMT@ACL.

[3]  Hermann Ney,et al.  Improved backing-off for M-gram language modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[4]  Bhiksha Raj,et al.  Quantization-based language model compression , 2001, INTERSPEECH.

[5]  Frederick Jelinek,et al.  Interpolated estimation of Markov source parameters from sparse data , 1980 .

[6]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[7]  Bo-June Paul Hsu,et al.  Generalized linear interpolation of language models , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[8]  Jean-Luc Gauvain,et al.  The LIMSI Broadcast News transcription system , 2002, Speech Commun..

[9]  F ChenStanley,et al.  An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[10]  William H. Press,et al.  Numerical recipes , 1990 .

[11]  Jianfeng Gao,et al.  MSRLM: a Scalable Language Modeling Toolkit , 2007 .

[12]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[13]  William H. Press,et al.  Numerical Recipes 3rd Edition: The Art of Scientific Computing , 2007 .

[14]  Brian Roark,et al.  MAP adaptation of stochastic grammars , 2006, Comput. Speech Lang..

[15]  J. Nocedal,et al.  A Limited Memory Algorithm for Bound Constrained Optimization , 1995, SIAM J. Sci. Comput..