Quantization-based language model compression

This paper describes two techniques for reducing the size of statistical back-off gram language models in computer memory. Language model compression is achieved through a combination of quantizing language model probabilities and back-off weights and the pruning of parameters that are determined to be unnecessary after quantization. The recognition performance of the original and compressed language models is evaluated across three different language models and two different recognition tasks. The results show that the language models can be compressed by up to 60% of their original size with no significant loss in recognition performance. Moreover, the techniques that are described provide a principled method with which to compress language models further while minimising degradation in recognition performance.

[1]  Richard M. Stern,et al.  Speech in Noisy Environments: robust automatic segmentation, feature extraction, and hypothesis combination , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[2]  Mosur Ravishankar,et al.  Efficient Algorithms for Speech Recognition. , 1996 .

[3]  Ronald Rosenfeld,et al.  Statistical language modeling using the CMU-cambridge toolkit , 1997, EUROSPEECH.

[4]  Jonathan G. Fiscus,et al.  1998 Broadcast News Benchmark Test Results: English and Non-English Word Error Rate Performance Measures , 1998 .

[5]  Bhiksha Raj,et al.  Comparison of width-wise and length-wise language model compression , 2001, INTERSPEECH.

[6]  Richard M. Stern,et al.  The 1996 Hub-4 Sphinx-3 System , 1997 .

[7]  Ronald Rosenfeld,et al.  Scalable backoff language models , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[8]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.