论文信息 - Modeling of term-distance and term-occurrence information for improving n-gram language model performance

Modeling of term-distance and term-occurrence information for improving n-gram language model performance

In this paper, we explore the use of distance and co-occurrence information of word-pairs for language modeling. We attempt to extract this information from history-contexts of up to ten words in size, and found it complements well the n-gram model, which inherently suffers from data scarcity in learning long history-contexts. Evaluated on the WSJ corpus, bigram and trigram model perplexity were reduced up to 23.5% and 14.0%, respectively. Compared to the distant bigram, we show that word-pairs can be more effectively modeled in terms of both distance and occurrence.

Haizhou Li | Chng Eng Siong | Tze Yuang Chong | Rafael E. Banchs

[1] Noah Coccaro,et al. Latent semantic analysis as a tool to improve automatic speech recognition performance , 2005 .

[2] Mei-Yuh Hwang,et al. The SPHINX-II speech recognition system: an overview , 1993, Comput. Speech Lang..

[3] Ronald Rosenfeld,et al. Trigger-based language models: a maximum entropy approach , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4] Kamel Smaïli,et al. Improving language models by using distant information , 2007, 2007 9th International Symposium on Signal Processing and Its Applications.

[5] Lalit R. Bahl,et al. A Maximum Likelihood Approach to Continuous Speech Recognition , 1983, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6] Dietrich Klakow,et al. Log-linear interpolation of language models , 1998, ICSLP.

[7] Frederick Jelinek,et al. Improved clustering techniques for class-based statistical language modeling , 1999 .

[8] Robert L. Mercer,et al. Class-Based n-gram Models of Natural Language , 1992, CL.

[9] Ronald Rosenfeld,et al. A maximum entropy approach to adaptive statistical language modelling , 1996, Comput. Speech Lang..

[10] Guodong Zhou,et al. Word Association and MI-TRigger-based Language Modeling , 1998, COLING-ACL.

[11] Mari Ostendorf,et al. Variable n-grams and extensions for conversational speech language modeling , 2000, IEEE Trans. Speech Audio Process..

[12] Slava M. Katz,et al. Estimation of probabilities from sparse data for the language model component of a speech recognizer , 1987, IEEE Trans. Acoust. Speech Signal Process..

[13] Hermann Ney,et al. Distant bigram language modelling using maximum entropy , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[14] Renato De Mori,et al. A Cache-Based Natural Language Model for Speech Recognition , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[15] Frederick Jelinek,et al. Structured language modeling , 2000, Comput. Speech Lang..

[16] Jerome R. Bellegarda,et al. A multispan language modeling framework for large vocabulary speech recognition , 1998, IEEE Trans. Speech Audio Process..

[17] ChengXiang Zhai,et al. Positional language models for information retrieval , 2009, SIGIR.

[18] Yorick Wilks,et al. A Closer Look at Skip-gram Modelling , 2006, LREC.

[19] F ChenStanley,et al. An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[20] W. B. Cavnar,et al. N-gram-based text categorization , 1994 .

[21] Anthony J. Robinson,et al. Language model adaptation using mixtures and an exponentially decaying cache , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.