Faster Phrase-Based Decoding by Refining Feature State

We contribute a faster decoding algorithm for phrase-based machine translation. Translation hypotheses keep track of state, such as context for the language model and coverage of words in the source sentence. Most features depend upon only part of the state, but traditional algorithms, including cube pruning, handle state atomically. For example, cube pruning will repeatedly query the language model with hypotheses that differ only in source coverage, despite the fact that source coverage is irrelevant to the language model. Our key contribution avoids this behavior by placing hypotheses into equivalence classes, masking the parts of state that matter least to the score. Moreover, we exploit shared words in hypotheses to iteratively refine language model scores rather than handling language model state atomically. Since our algorithm and cube pruning are both approximate, improvement can be used to increase speed or accuracy. When tuned to attain the same accuracy, our algorithm is 4.0‐7.7 times as fast as the Moses decoder with cube pruning.

[1]  David Chiang,et al.  Hierarchical Phrase-Based Translation , 2007, CL.

[2]  Daniel Gildea,et al.  Efficient Multi-Pass Decoding for Synchronous Context Free Grammars , 2008, ACL.

[3]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[4]  Dan Klein,et al.  Coarse-to-Fine Syntactic Machine Translation using Language Projections , 2008, EMNLP.

[5]  Philipp Koehn,et al.  Left language model state for syntactic machine translation , 2011, IWSLT.

[6]  Alon Lavie,et al.  Grouping Language Model Boundary Words to Speed K-Best Extraction from Hypergraphs , 2013, HLT-NAACL.

[7]  Markus Freitag,et al.  Jane 2: Open Source Phrase-based and Hierarchical Statistical Machine Translation , 2012, COLING.

[8]  Hermann Ney,et al.  Improved backing-off for M-gram language modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[9]  David Chiang,et al.  Forest Rescoring: Faster Decoding with Integrated Language Models , 2007, ACL.

[10]  Bruce T. Lowerre,et al.  The HARPY speech recognition system , 1976 .

[11]  F ChenStanley,et al.  An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[12]  Philipp Koehn,et al.  Pharaoh: A Beam Search Decoder for Phrase-Based Statistical Machine Translation Models , 2004, AMTA.

[13]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[14]  Michael Collins,et al.  Exact Decoding of Phrase-Based Translation Models through Lagrangian Relaxation , 2011, EMNLP.

[15]  Daniel Jurafsky,et al.  Phrasal: A Statistical Machine Translation Toolkit for Exploring New Model Features , 2010, NAACL.

[16]  Sanjeev Khudanpur,et al.  A Scalable Decoder for Parsing-Based Machine Translation with Equivalent Language Model State Maintenance , 2008, SSST@ACL.

[17]  Kenneth Heafield,et al.  KenLM: Faster and Smaller Language Model Queries , 2011, WMT@EMNLP.