Enhancing Language Models in Statistical Machine Translation with Backward N-grams and Mutual Information Triggers

In this paper, with a belief that a language model that embraces a larger context provides better prediction ability, we present two extensions to standard n-gram language models in statistical machine translation: a backward language model that augments the conventional forward language model, and a mutual information trigger model which captures long-distance dependencies that go beyond the scope of standard n-gram language models. We integrate the two proposed models into phrase-based statistical machine translation and conduct experiments on large-scale training data to investigate their effectiveness. Our experimental results show that both models are able to significantly improve translation quality and collectively achieve up to 1 BLEU point over a competitive baseline.

[1]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[2]  Ying Zhang,et al.  Distributed Language Modeling for N-best List Re-ranking , 2006, EMNLP.

[3]  Eiichiro Sumita,et al.  Bidirectional Phrase-based Statistical Machine Translation , 2009, EMNLP.

[4]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[5]  Qun Liu,et al.  Maximum Entropy Based Phrase Reordering Model for Statistical Machine Translation , 2006, ACL.

[6]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[7]  Patrick Wambacq,et al.  Confidence scoring based on backward language models , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8]  Thorsten Brants,et al.  Large Language Models in Machine Translation , 2007, EMNLP.

[9]  Kamel Smaïli,et al.  New Confidence Measures for Statistical Machine Translation , 2009, ICAART.

[10]  Ronald Rosenfeld,et al.  Adaptive Statistical Language Modeling; A Maximum Entropy Approach , 1994 .

[11]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[12]  Philipp Koehn,et al.  Statistical Significance Tests for Machine Translation Evaluation , 2004, EMNLP.

[13]  Miles Osborne,et al.  Randomised Language Modelling for Statistical Machine Translation , 2007, ACL.

[14]  Dekai Wu,et al.  Stochastic Inversion Transduction Grammars and Bilingual Parsing of Parallel Corpora , 1997, CL.

[15]  Joshua Goodman,et al.  A bit of progress in language modeling , 2001, Comput. Speech Lang..

[16]  Kenji Yamada,et al.  Syntax-based language models for statistical machine translation , 2003, ACL 2003.

[17]  Hermann Ney,et al.  Extending Statistical Machine Translation with Discriminative and Trigger-Based Lexicon Models , 2009, EMNLP.

[18]  Matt Post,et al.  Syntax-based language models for statistical machine translation , 2010 .

[19]  Guodong Zhou Modeling of Long Distance Context Dependency , 2004, COLING.

[20]  Zhou GuoDong Modeling of long distance context dependency , 2004, COLING 2004.

[21]  Jinxi Xu,et al.  A New String-to-Dependency Machine Translation Algorithm with a Target Dependency Language Model , 2008, ACL.

[22]  Maria Leonor Pacheco,et al.  of the Association for Computational Linguistics: , 2001 .

[23]  Dekai Wu,et al.  A Polynomial-Time Algorithm for Statistical Machine Translation , 1996, ACL.

[24]  Ahmad Emami,et al.  Large-Scale Distributed Language Modeling , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[25]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[26]  Daniel Gildea Parsers as language models for statistical machine translation , 2008 .

[27]  David Chiang,et al.  Hierarchical Phrase-Based Translation , 2007, CL.