Left language model state for syntactic machine translation

L ef t L en gt h Right Length 1 2 3 4 0 -0.741 -1.062 -1.357 -1.701 1 -0.269 -0.429 -0.588 -0.836 2 -0.129 -0.236 -0.362 -0.567 3 0.007 -0.061 -0.128 -0.314 4 0.220 0.202 0.169 0.037 Short left state predicts poor performance. Conclusion •Equivalent quality with 11% net reduction in CPU time. •Left state minimization combines fragments that perform poorly. •Right state minimization combines fragments that perform well. •Future work using state length as a rest cost estimator. •Clean high-level C++ interface for language models in syntactic decoders. •Live in Moses and cdec. http://kheafield.com/code/kenlm/

[1]  Chris Callison-Burch,et al.  Demonstration of Joshua: An Open Source Toolkit for Parsing-based Machine Translation , 2009, ACL.

[2]  Michael Collins,et al.  Head-Driven Statistical Models for Natural Language Parsing , 2003, CL.

[3]  Philipp Koehn,et al.  Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[4]  Kenneth Heafield,et al.  KenLM: Faster and Smaller Language Model Queries , 2011, WMT@EMNLP.

[5]  David Chiang,et al.  Forest Rescoring: Faster Decoding with Integrated Language Models , 2007, ACL.

[6]  Haitao Mi,et al.  Efficient Incremental Decoding for Tree-to-String Translation , 2010, EMNLP.

[7]  Mauro Cettolo,et al.  IRSTLM: an open source toolkit for handling large scale language models , 2008, INTERSPEECH.

[8]  David Chiang,et al.  Hierarchical Phrase-Based Translation , 2007, CL.

[9]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[10]  Taro Watanabe,et al.  Left-to-Right Target Generation for Hierarchical Phrase-Based Translation , 2006, ACL.

[11]  Philipp Koehn,et al.  Findings of the 2011 Workshop on Statistical Machine Translation , 2011, WMT@EMNLP.

[12]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[13]  Philipp Koehn,et al.  A unified framework for phrase-based, hierarchical, and syntax-based statistical machine translation , 2009, IWSLT.

[14]  Sanjeev Khudanpur,et al.  A Scalable Decoder for Parsing-Based Machine Translation with Equivalent Language Model State Maintenance , 2008, SSST@ACL.