Preference Grammars and Soft Syntactic Constraints for GHKM Syntax-based Statistical Machine Translation

In this work, we investigate the effectiveness of two techniques for a featurebased integration of syntactic information into GHKM string-to-tree statistical machine translation (Galley et al., 2004): (1.) Preference grammars on the target language side promote syntactic wellformedness during decoding while also allowing for derivations that are not linguistically motivated (as in hierarchical translation). (2.) Soft syntactic constraints augment the system with additional sourceside syntax features while not modifying the set of string-to-tree translation rules or the baseline feature scores. We conduct experiments with a stateof-the-art setup on an English!German translation task. Our results suggest that preference grammars for GHKM translation are inferior to the plain targetsyntactified model, whereas the enhancement with soft source syntactic constraints provides consistent gains. By employing soft source syntactic constraints with sparse features, we are able to achieve improvements of up to 0.7 points BLEU and 1.0 points TER.

[1]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[2]  Philipp Koehn,et al.  Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[3]  Noah A. Smith,et al.  Preference Grammars: Softening Syntactic Constraints to Improve Statistical Machine Translation , 2009, NAACL.

[4]  Philipp Koehn,et al.  GHKM Rule Extraction and Scope-3 Parsing in Moses , 2012, WMT@NAACL-HLT.

[5]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[6]  F ChenStanley,et al.  An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[7]  Stephan Vogel,et al.  Parallel Implementations of Word Alignment Tool , 2008, SETQALNLP.

[8]  Feifei Zhai,et al.  Augmenting String-to-Tree Translation Models with Fuzzy Use of Source-side Syntax , 2011, EMNLP.

[9]  Daniel Marcu,et al.  What Can Syntax-Based MT Learn from Phrase-Based MT? , 2007, EMNLP.

[10]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[11]  Andreas Zollmann,et al.  Syntax Augmented Machine Translation via Chart Parsing , 2006, WMT@HLT-NAACL.

[12]  Philipp Koehn,et al.  Edinburgh’s Syntax-Based Machine Translation Systems , 2013, WMT@ACL.

[13]  Jean-Cédric Chappelier,et al.  A Generalized CYK Algorithm for Parsing Stochastic CFG , 1998, TAPD.

[14]  Philipp Koehn,et al.  Findings of the 2014 Workshop on Statistical Machine Translation , 2014, WMT@ACL.

[15]  Kenneth Heafield,et al.  KenLM: Faster and Smaller Language Model Queries , 2011, WMT@EMNLP.

[16]  Philip Resnik,et al.  Soft Syntactic Constraints for Hierarchical Phrased-Based Translation , 2008, ACL.

[17]  Daniel Marcu,et al.  Binarizing Syntax Trees to Improve Syntax-Based Machine Translation Accuracy , 2007, EMNLP.

[18]  Matthew G. Snover,et al.  A Study of Translation Edit Rate with Targeted Human Annotation , 2006, AMTA.

[19]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[20]  Rabih Zbib,et al.  Factored Soft Source Syntactic Constraints for Hierarchical Machine Translation , 2013, EMNLP.

[21]  Dan Klein,et al.  Learning Accurate, Compact, and Interpretable Tree Annotation , 2006, ACL.

[22]  Daniel Marcu,et al.  Re-structuring, Re-labeling, and Re-aligning for Syntax-Based Machine Translation , 2010, CL.

[23]  George F. Foster,et al.  Batch Tuning Strategies for Statistical Machine Translation , 2012, NAACL.

[24]  Daniel Marcu,et al.  Scalable Inference and Training of Context-Rich Syntactic Translation Models , 2006, ACL.

[25]  Hermann Ney,et al.  Improved backing-off for M-gram language modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[26]  Philipp Koehn,et al.  A unified framework for phrase-based, hierarchical, and syntax-based statistical machine translation , 2009, IWSLT.

[27]  Ralph Weischedel,et al.  A STUDY OF TRANSLATION ERROR RATE WITH TARGETED HUMAN ANNOTATION , 2005 .

[28]  Helmut Schmid Efficient Parsing of Highly Ambiguous Context-Free Grammars with Bit Vectors , 2004, COLING.

[29]  David Chiang,et al.  Hierarchical Phrase-Based Translation , 2007, CL.

[30]  Philipp Koehn,et al.  Edinburgh's Syntax-Based Systems at WMT 2014 , 2014, WMT@ACL.

[31]  Daniel Marcu,et al.  What’s in a translation rule? , 2004, NAACL.

[32]  H. Ney,et al.  A Cocktail of Deep Syntactic Features for Hierarchical Machine Translation , 2010, AMTA.

[33]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.