Optimal Beam Search for Machine Translation

Beam search is a fast and empirically effective method for translation decoding, but it lacks formal guarantees about search error. We develop a new decoding algorithm that combines the speed of beam search with the optimal certificate property of Lagrangian relaxation, and apply it to phrase- and syntax-based translation decoding. The new method is efficient, utilizes standard MT algorithms, and returns an exact solution on the majority of translation examples in our test data. The algorithm is 3.5 times faster than an optimized incremental constraint-based decoder for phrase-based translation and 4 times faster for syntax-based translation.

[1]  Andrew McCallum,et al.  Parse, Price and Cut--Delayed Column and Row Generation for Graph Based Parsers , 2012, EMNLP-CoNLL.

[2]  Alexander M. Rush,et al.  A Tutorial on Dual Decomposition and Lagrangian Relaxation for Inference in Natural Language Processing , 2012, J. Artif. Intell. Res..

[3]  Alexander M. Rush,et al.  Exact Decoding of Syntactic Translation Models through Lagrangian Relaxation , 2011, ACL.

[4]  Shankar Kumar,et al.  Local Phrase Reordering Models for Statistical Machine Translation , 2005, HLT.

[5]  Kevin Knight,et al.  Decoding Complexity in Word-Replacement Translation Models , 1999, Comput. Linguistics.

[6]  David Chiang,et al.  Better k-best Parsing , 2005, IWPT.

[7]  Philipp Koehn,et al.  Pharaoh: A Beam Search Decoder for Phrase-Based Statistical Machine Translation Models , 2004, AMTA.

[8]  Vladimir Eidelman,et al.  cdec: A Decoder, Alignment, and Learning Framework for Finite- State and Context-Free Translation Models , 2010, ACL.

[9]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[10]  Sebastian Riedel,et al.  Revisiting Optimal Decoding for Machine Translation IBM Model 4 , 2009, HLT-NAACL.

[11]  Hermann Ney,et al.  Word Reordering and a Dynamic Programming Beam Search Algorithm for Statistical Machine Translation , 2003, CL.

[12]  Hermann Ney,et al.  An Efficient A* Search Algorithm for Statistical Machine Translation , 2001, DDMMT@ACL.

[13]  Ronald L. Rardin,et al.  Polyhedral Characterization of Discrete Dynamic Programming , 1990, Oper. Res..

[14]  David Chiang,et al.  Forest Rescoring: Faster Decoding with Integrated Language Models , 2007, ACL.

[15]  Daniel Marcu,et al.  Fast Decoding and Optimal Decoding for Machine Translation , 2001, ACL.

[16]  Andrew McCallum,et al.  MAP Inference in Chains using Column Generation , 2012, NIPS.

[17]  William J. Byrne,et al.  Hierarchical Phrase-Based Translation with Weighted Finite-State Transducers and Shallow-n Grammars , 2010, CL.

[18]  Marc Dymetman,et al.  Phrase-Based Statistical Machine Translation as a Traveling Salesman Problem , 2009, ACL/IJCNLP.

[19]  Haitao Mi,et al.  Efficient Incremental Decoding for Tree-to-String Translation , 2010, EMNLP.

[20]  William J. Byrne,et al.  Rule Filtering by Pattern for Efficient Hierarchical Translation , 2009, EACL.

[21]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[22]  O. Nelles,et al.  An Introduction to Optimization , 1996, IEEE Antennas and Propagation Magazine.

[23]  David Chiang,et al.  Hierarchical Phrase-Based Translation , 2007, CL.

[24]  Michael Collins,et al.  Exact Decoding of Phrase-Based Translation Models through Lagrangian Relaxation , 2011, EMNLP.

[25]  Taro Watanabe,et al.  Left-to-Right Target Generation for Hierarchical Phrase-Based Translation , 2006, ACL.

[26]  Christoph Tillmann,et al.  Efficient Dynamic Programming Search Algorithms for Phrase-Based SMT , 2006 .