论文信息 - Long Distance Dependency in Language Modeling: An Empirical Study

Long Distance Dependency in Language Modeling: An Empirical Study

This paper presents an extensive empirical study on two language modeling techniques, linguistically-motivated word skipping and predictive clustering, both of which are used in capturing long distance word dependencies that are beyond the scope of a word trigram model. We compare the techniques to others that were proposed previously for the same purpose. We evaluate the resulting models on the task of Japanese Kana-Kanji conversion. We show that the two techniques, while simple, outperform existing methods studied in this paper, and lead to language models that perform significantly better than a word trigram model. We also investigate how factors such as training corpus size and genre affect the performance of the models.

Jianfeng Gao | Hisami Suzuki

[1] Hang Li,et al. Exploring Asymmetric Clustering for Statistical Language Modeling , 2002, ACL.

[2] Andreas Stolcke,et al. Structure and performance of a dependency language model , 1997, EUROSPEECH.

[3] Brian Roark,et al. Probabilistic Top-Down Parsing and Language Modeling , 2001, CL.

[4] Hermann Ney,et al. Improved backing-off for M-gram language modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[5] Frederick Jelinek,et al. Structured language modeling , 2000, Comput. Speech Lang..

[6] John D. Lafferty,et al. Inference and Estimation of a Long-Range Trigram Model , 1994, ICGI.

[7] Mari Ostendorf,et al. Variable n-grams and extensions for conversational speech language modeling , 2000, IEEE Trans. Speech Audio Process..

[8] Jianfeng Gao,et al. Exploiting Headword Dependency and Predictive Clustering for Language Modeling , 2002, EMNLP.

[9] Ryosuke Isotani,et al. A stochastic language model for speech recognition integrating local and global constraints , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[10] Michael Collins,et al. A New Statistical Parser Based on Bigram Lexical Dependencies , 1996, ACL.

[11] Deniz Yuret,et al. Discovery of linguistic relations using lexical attraction , 1998, ArXiv.

[12] Joshua Goodman,et al. A bit of progress in language modeling , 2001, Comput. Speech Lang..

[13] Jianfeng Gao,et al. The Use of Clustering Techniques for Language Modeling V Application to Asian Language , 2001, ROCLING/IJCLCLP.

[14] Hermann Ney,et al. On structuring probabilistic dependences in stochastic language modelling , 1994, Comput. Speech Lang..

[15] Eugene Charniak,et al. Immediate-Head Parsing for Language Models , 2001, ACL.

[16] Providen e RIe. Immediate-Head Parsing for Language Models , 2001 .

[17] Slava M. Katz,et al. Estimation of probabilities from sparse data for the language model component of a speech recognizer , 1987, IEEE Trans. Acoust. Speech Signal Process..

[18] Petra Geutner. Introducing linguistic constraints into statistical language modeling , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[19] Jianfeng Gao,et al. Unsupervised Learning of Dependency Structure for Language Modeling , 2003, ACL.

[20] Andreas Stolcke,et al. Entropy-based Pruning of Backoff Language Models , 2000, ArXiv.

[21] Ryosuke Isotani,et al. Speech Recognition Using a Stochastic Language Model Integrating Local and Global Constraints , 1994, HLT.

[22] Roger K. Moore. Computer Speech and Language , 1986 .

[23] Ronald Rosenfeld,et al. Adaptive Statistical Language Modeling; A Maximum Entropy Approach , 1994 .