Long Distance Dependency in Language Modeling: An Empirical Study

This paper presents an extensive empirical study on two language modeling techniques, linguistically-motivated word skipping and predictive clustering, both of which are used in capturing long distance word dependencies that are beyond the scope of a word trigram model. We compare the techniques to others that were proposed previously for the same purpose. We evaluate the resulting models on the task of Japanese Kana-Kanji conversion. We show that the two techniques, while simple, outperform existing methods studied in this paper, and lead to language models that perform significantly better than a word trigram model. We also investigate how factors such as training corpus size and genre affect the performance of the models.

[1]  Hang Li,et al.  Exploring Asymmetric Clustering for Statistical Language Modeling , 2002, ACL.

[2]  Andreas Stolcke,et al.  Structure and performance of a dependency language model , 1997, EUROSPEECH.

[3]  Brian Roark,et al.  Probabilistic Top-Down Parsing and Language Modeling , 2001, CL.

[4]  Hermann Ney,et al.  Improved backing-off for M-gram language modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[5]  Frederick Jelinek,et al.  Structured language modeling , 2000, Comput. Speech Lang..

[6]  John D. Lafferty,et al.  Inference and Estimation of a Long-Range Trigram Model , 1994, ICGI.

[7]  Mari Ostendorf,et al.  Variable n-grams and extensions for conversational speech language modeling , 2000, IEEE Trans. Speech Audio Process..

[8]  Jianfeng Gao,et al.  Exploiting Headword Dependency and Predictive Clustering for Language Modeling , 2002, EMNLP.

[9]  Ryosuke Isotani,et al.  A stochastic language model for speech recognition integrating local and global constraints , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[10]  Michael Collins,et al.  A New Statistical Parser Based on Bigram Lexical Dependencies , 1996, ACL.

[11]  Deniz Yuret,et al.  Discovery of linguistic relations using lexical attraction , 1998, ArXiv.

[12]  Joshua Goodman,et al.  A bit of progress in language modeling , 2001, Comput. Speech Lang..

[13]  Jianfeng Gao,et al.  The Use of Clustering Techniques for Language Modeling V Application to Asian Language , 2001, ROCLING/IJCLCLP.

[14]  Hermann Ney,et al.  On structuring probabilistic dependences in stochastic language modelling , 1994, Comput. Speech Lang..

[15]  Eugene Charniak,et al.  Immediate-Head Parsing for Language Models , 2001, ACL.

[16]  Providen e RIe Immediate-Head Parsing for Language Models , 2001 .

[17]  Slava M. Katz,et al.  Estimation of probabilities from sparse data for the language model component of a speech recognizer , 1987, IEEE Trans. Acoust. Speech Signal Process..

[18]  Petra Geutner Introducing linguistic constraints into statistical language modeling , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[19]  Jianfeng Gao,et al.  Unsupervised Learning of Dependency Structure for Language Modeling , 2003, ACL.

[20]  Andreas Stolcke,et al.  Entropy-based Pruning of Backoff Language Models , 2000, ArXiv.

[21]  Ryosuke Isotani,et al.  Speech Recognition Using a Stochastic Language Model Integrating Local and Global Constraints , 1994, HLT.

[22]  Roger K. Moore Computer Speech and Language , 1986 .

[23]  Ronald Rosenfeld,et al.  Adaptive Statistical Language Modeling; A Maximum Entropy Approach , 1994 .