Sampling Phrase Tables for the Moses Statistical Machine Translation System

Abstract The idea of virtual phrase tables for statistical machine translation (SMT) that construct phrase table entries on demand by sampling a fully indexed bitext was first proposed ten years ago by Callison-Burch et al. (2005). However, until recently (Germann, 2014) no working and practical implementation of this approach was available in the Moses SMT system. We describe and evaluate this implementation in more detail. Sampling phrase tables are much faster to build and are competitive with conventional phrase tables in terms of translation quality and speed.

[1]  E. S. Pearson,et al.  THE USE OF CONFIDENCE OR FIDUCIAL LIMITS ILLUSTRATED IN THE CASE OF THE BINOMIAL , 1934 .

[2]  Philipp Koehn,et al.  Margin Infused Relaxed Algorithm for Moses , 2011, Prague Bull. Math. Linguistics.

[3]  Chris Callison-Burch,et al.  Scaling Phrase-Based Statistical Machine Translation to Larger Corpora and Longer Phrases , 2005, ACL.

[4]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[5]  Vladimir Eidelman,et al.  cdec: A Decoder, Alignment, and Learning Framework for Finite- State and Context-Free Translation Models , 2010, ACL.

[6]  Marcin Junczys-Dowmunt,et al.  Phrasal Rank-Encoding: Exploiting Phrase Redundancy and Translational Relations for Phrase Table Compression , 2012, Prague Bull. Math. Linguistics.

[7]  David Chiang,et al.  Hierarchical Phrase-Based Translation , 2007, CL.

[8]  Ulrich Germann Dynamic Phrase Tables for Machine Translation in an Interactive Post-editing Scenario , 2014 .

[9]  Hermann Ney,et al.  Efficient Phrase-Table Representation for Machine Translation with Applications to Online MT and Speech Translation , 2007, NAACL.

[10]  Hermann Ney,et al.  The RWTH machine translation system for IWSLT 2008. , 2008, IWSLT.

[11]  Eugene W. Myers,et al.  Suffix arrays: a new method for on-line string searches , 1993, SODA '90.

[12]  Christopher D. Manning,et al.  A Simple and Effective Hierarchical Phrase Reordering Model , 2008, EMNLP.

[13]  Adam David Lopez,et al.  Machine Translation by Pattern Matching , 2008 .

[14]  Chris Callison-Burch,et al.  Demonstration of Joshua: An Open Source Toolkit for Parsing-based Machine Translation , 2009, ACL.

[15]  Roland Kuhn,et al.  Phrasetable Smoothing for Statistical Machine Translation , 2006, EMNLP.

[16]  Chris Callison-Burch,et al.  NUMBER 93 JANUARY 2010 157 – 166 Hierarchical Phrase-Based Grammar Extraction in Joshua Suffix Arrays and Prefix Trees , 2010 .

[17]  Adam Lopez,et al.  Hierarchical Phrase-Based Translation with Suffix Arrays , 2007, EMNLP.