Translation Rules with Right-Hand Side Lattices

In Corpus-Based Machine Translation, the search space of the translation candidates for a given input sentence is often defined by a set of (cyclefree) context-free grammar rules. This happens naturally in Syntax-Based Machine Translation and Hierarchical Phrase-Based Machine Translation (where the representation will be the set of the target-side half of the synchronous rules used to parse the input sentence). But it is also possible to describe Phrase-Based Machine Translation in this framework. We propose a natural extension to this representation by using lattice-rules that allow to easily encode an exponential number of variations of each rules. We also demonstrate how the representation of the search space has an impact on decoding efficiency, and how it is possible to optimize this representation.

[1]  Eiichiro Sumita,et al.  Overview of the Patent Machine Translation Task at the NTCIR-10 Workshop , 2011, NTCIR.

[2]  Alon Lavie,et al.  Language Model Rest Costs and Space-Efficient Storage , 2012, EMNLP.

[3]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[4]  Alon Lavie,et al.  Grouping Language Model Boundary Words to Speed K-Best Extraction from Hypergraphs , 2013, HLT-NAACL.

[5]  Alexander M. Rush,et al.  Exact Decoding of Syntactic Translation Models through Lagrangian Relaxation , 2011, ACL.

[6]  Sanjeev Khudanpur,et al.  A Scalable Decoder for Parsing-Based Machine Translation with Equivalent Language Model State Maintenance , 2008, SSST@ACL.

[7]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[8]  John E. Hopcroft,et al.  An n log n algorithm for minimizing states in a finite automaton , 1971 .

[9]  William J. Byrne,et al.  Hierarchical Phrase-Based Translation with Weighted Finite-State Transducers and Shallow-n Grammars , 2010, CL.

[10]  Kevin Knight,et al.  Training Tree Transducers , 2004, NAACL.

[11]  Yaser Al-Onaizan,et al.  Translation with Finite-State Devices , 1998, AMTA.

[12]  Alon Lavie,et al.  Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, EMNLP-CoNLL 2012, July 12-14, 2012, Jeju Island, Korea , 2012 .

[13]  S. H. A N K A R K U M A R,et al.  A weighted finite state transducer translation template model for statistical machine translation , 2005, Natural Language Engineering.

[14]  Mehryar Mohri Weighted Finite-State Transducer Algorithms. An Overview , 2004 .

[15]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[16]  Kenneth Heafield,et al.  KenLM: Faster and Smaller Language Model Queries , 2011, WMT@EMNLP.

[17]  PietraVincent J. Della,et al.  The mathematics of statistical machine translation , 1993 .

[18]  Christopher D. Manning,et al.  Faster Phrase-Based Decoding by Refining Feature State , 2014, ACL.

[19]  Sadao Kurohashi,et al.  KyotoEBMT: An Example-Based Dependency-to-Dependency Translation Framework , 2014, ACL.

[20]  Mehryar Mohri,et al.  Finite-State Transducers in Language and Speech Processing , 1997, CL.

[21]  Francisco Casacuberta,et al.  Machine Translation with Inferred Stochastic Finite-State Transducers , 2004, Computational Linguistics.

[22]  Philipp Koehn,et al.  Left language model state for syntactic machine translation , 2011, IWSLT.