On LM Heuristics for the Cube Growing Algorithm

Current approaches to statistical machine translation try to incorporate more structure into the translation process by including explicit syntactic information in form of a formal grammar (with a possible, but not necessary, correspondence to a linguistic motivated grammar). These more structured models incur into an increased generation cost, and efficient algorithms must be developed. In this paper we concentrate on the cube growing algorithm, a lazy version of the cube grow algorithm. The efficiency of this algorithm depends on a heuristic for language model computation, which is only scarcely discussed in the original paper. In this paper we investigate the effect of this heuristic on translation performance and efficiency and propose a new heuristic which efficiently decreases memory requirements and computation time, while maintaining translation performance.

[1]  Thorsten Brants,et al.  TnT – A Statistical Part-of-Speech Tagger , 2000, ANLP.

[2]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[3]  NeyHermann,et al.  A systematic comparison of various statistical alignment models , 2003 .

[4]  Philipp Koehn,et al.  Pharaoh: A Beam Search Decoder for Phrase-Based Statistical Machine Translation Models , 2004, AMTA.

[5]  Daniel Marcu,et al.  SPMT: Statistical Machine Translation with Syntactified Target Language Phrases , 2006, EMNLP.

[6]  David Chiang,et al.  A Hierarchical Phrase-Based Model for Statistical Machine Translation , 2005, ACL.

[7]  Jean-Cédric Chappelier,et al.  A Generalized CYK Algorithm for Parsing Stochastic CFG , 1998, TAPD.

[8]  Hermann Ney,et al.  Improved backing-off for M-gram language modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[9]  David Chiang,et al.  Forest Rescoring: Faster Decoding with Integrated Language Models , 2007, ACL.

[10]  Philipp Koehn,et al.  Further Meta-Evaluation of Machine Translation , 2008, WMT@ACL.

[11]  Franz Josef Och,et al.  An Efficient Method for Determining Bilingual Word Classes , 1999, EACL.

[12]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[13]  Dan Klein,et al.  Coarse-to-Fine Syntactic Machine Translation using Language Projections , 2008, EMNLP.

[14]  David Chiang,et al.  Hierarchical Phrase-Based Translation , 2007, CL.