EXPLOITING SYNTACTIC, SEMANTIC, AND LEXICAL REGULARITIES IN LANGUAGE MODELING VIA DIRECTED MARKOV RANDOM FIELDS

We present a directed Markov random field (MRF) model that combines n‐gram models, probabilistic context‐free grammars (PCFGs), and probabilistic latent semantic analysis (PLSA) for the purpose of statistical language modeling. Even though the composite directed MRF model potentially has an exponential number of loops and becomes a context‐sensitive grammar, we are nevertheless able to estimate its parameters in cubic time using an efficient modified Expectation‐Maximization (EM) method, the generalized inside–outside algorithm, which extends the inside–outside algorithm to incorporate the effects of the n‐gram and PLSA language models. We generalize various smoothing techniques to alleviate the sparseness of n‐gram counts in cases where there are hidden variables. We also derive an analogous algorithm to find the most likely parse of a sentence and to calculate the probability of initial subsequence of a sentence, all generated by the composite language model. Our experimental results on the Wall Street Journal corpus show that we obtain significant reductions in perplexity compared to the state‐of‐the‐art baseline trigram model with Good–Turing and Kneser–Ney smoothing techniques.

[1]  Frederick Jelinek The Dawn of Statistical ASR and MT , 2009, Computational Linguistics.

[2]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[3]  Zhiyi Chi,et al.  Statistical Properties of Probabilistic Context-Free Grammars , 1999, CL.

[4]  Hermann Ney,et al.  On the Estimation of 'Small' Probabilities by Leaving-One-Out , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Fernando C Pereira Formal grammar and information theory: together again? , 2000, Philosophical Transactions of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences.

[6]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[7]  Adam L. Berger,et al.  A Maximum Entropy Approach to Natural Language Processing , 1996, CL.

[8]  Stanley F. Chen,et al.  An empirical study of smoothing techniques for language modeling , 1999 .

[9]  Richard Sproat,et al.  The bell labs German text-to-speech system: an overview , 1997, EUROSPEECH.

[10]  Jun Wu,et al.  Maximum entropy techniques for exploiting syntactic, semantic and collocational dependencies in language modeling , 2000, Comput. Speech Lang..

[11]  J.R. Bellegarda,et al.  Exploiting latent semantic information in statistical language modeling , 2000, Proceedings of the IEEE.

[12]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[13]  Slava M. Katz,et al.  Estimation of probabilities from sparse data for the language model component of a speech recognizer , 1987, IEEE Trans. Acoust. Speech Signal Process..

[14]  Steven P. Abney Stochastic Attribute-Value Grammars , 1996, CL.

[15]  Ronald Rosenfeld,et al.  Incorporating linguistic structure into statistical language models , 2000, Philosophical Transactions of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences.

[16]  Daniel H. Younger,et al.  Recognition and Parsing of Context-Free Languages in Time n^3 , 1967, Inf. Control..

[17]  Dale Schuurmans,et al.  The latent maximum entropy principle , 2002, Proceedings IEEE International Symposium on Information Theory,.

[18]  J. D. Lafferty A derivation of the Inside-Outside algorithm from the EM algorithm , 1993 .

[19]  Thomas Hofmann,et al.  Unsupervised Learning by Probabilistic Latent Semantic Analysis , 2004, Machine Learning.

[20]  Ronald Rosenfeld,et al.  Whole-sentence exponential language models: a vehicle for linguistic-statistical integration , 2001, Comput. Speech Lang..

[21]  Ronald Rosenfeld,et al.  A maximum entropy approach to adaptive statistical language modelling , 1996, Comput. Speech Lang..

[22]  Zhiyi Chi,et al.  Estimation of Probabilistic Context-Free Grammars , 1998, Comput. Linguistics.

[23]  Keh-Jiann Chen,et al.  An Efficient Natural Language Processing System Specially Designed for the Chinese Language , 1991, Comput. Linguistics.

[24]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[25]  Frederick Jelinek,et al.  Structured language modeling , 2000, Comput. Speech Lang..

[26]  I. Good THE POPULATION FREQUENCIES OF SPECIES AND THE ESTIMATION OF POPULATION PARAMETERS , 1953 .

[27]  Hanna M. Wallach,et al.  Topic modeling: beyond bag-of-words , 2006, ICML.

[28]  Taylor L. Booth,et al.  Applying Probability Measures to Abstract Languages , 1973, IEEE Transactions on Computers.

[29]  Joseph A. O'Sullivan,et al.  Entropies and combinatorics of random branching processes and context-free languages , 1992, IEEE Trans. Inf. Theory.

[30]  Mark Johnson,et al.  PCFG Models of Linguistic Tree Representations , 1998, CL.

[31]  Sven C. Martin,et al.  Statistical Language Modeling Using Leaving-One-Out , 1997 .

[32]  Brian Roark,et al.  Probabilistic Top-Down Parsing and Language Modeling , 2001, CL.

[33]  P. Donnelly,et al.  Inference of population structure using multilocus genotype data. , 2000, Genetics.