Exploiting Headword Dependency and Predictive Clustering for Language Modeling

This paper presents several practical ways of incorporating linguistic structure into language models. A headword detector is first applied to detect the headword of each phrase in a sentence. A permuted headword trigram model (PHTM) is then generated from the annotated corpus. Finally, PHTM is extended to a cluster PHTM (C-PHTM) by defining clusters for similar words in the corpus. We evaluated the proposed models on the realistic application of Japanese Kana-Kanji conversion. Experiments show that C-PHTM achieves 15% error rate reduction over the word trigram model. This demonstrates that the use of simple methods such as the headword trigram and predictive clustering can effectively capture long distance word dependency, and substantially outperform a word trigram model.

[1]  Jianfeng Gao,et al.  The Use of Clustering Techniques for Language Modeling V Application to Asian Language , 2001, ROCLING/IJCLCLP.

[2]  Frederick Jelinek,et al.  Structured language modeling , 2000, Comput. Speech Lang..

[3]  Ryosuke Isotani,et al.  A stochastic language model for speech recognition integrating local and global constraints , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[4]  Jianfeng Gao,et al.  Language model size reduction by pruning and clustering , 2000, INTERSPEECH.

[5]  Hermann Ney,et al.  On structuring probabilistic dependences in stochastic language modelling , 1994, Comput. Speech Lang..

[6]  Slava M. Katz,et al.  Estimation of probabilities from sparse data for the language model component of a speech recognizer , 1987, IEEE Trans. Acoust. Speech Signal Process..

[7]  Petra Geutner Introducing linguistic constraints into statistical language modeling , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[8]  Mari Ostendorf,et al.  Variable n-grams and extensions for conversational speech language modeling , 2000, IEEE Trans. Speech Audio Process..

[9]  Joshua Goodman,et al.  A bit of progress in language modeling , 2001, Comput. Speech Lang..

[10]  Ronald Rosenfeld,et al.  Adaptive Statistical Language Modeling; A Maximum Entropy Approach , 1994 .

[11]  Roger K. Moore Computer Speech and Language , 1986 .

[12]  Brian Roark,et al.  Probabilistic Top-Down Parsing and Language Modeling , 2001, CL.

[13]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[14]  Jianfeng Gao,et al.  Toward a unified approach to statistical language modeling for Chinese , 2002, TALIP.

[15]  Eugene Charniak,et al.  Immediate-Head Parsing for Language Models , 2001, ACL.

[16]  J. Cleary,et al.  \self-organized Language Modeling for Speech Recognition". In , 1997 .