Integrating source-language context into phrase-based statistical machine translation

The translation features typically used in Phrase-Based Statistical Machine Translation (PB-SMT) model dependencies between the source and target phrases, but not among the phrases in the source language themselves. A swathe of research has demonstrated that integrating source context modelling directly into log-linear PB-SMT can positively influence the weighting and selection of target phrases, and thus improve translation quality. In this contribution we present a revised, extended account of our previous work on using a range of contextual features, including lexical features of neighbouring words, supertags, and dependency information. We add a number of novel aspects, including the use of semantic roles as new contextual features in PB-SMT, adding new language pairs, and examining the scalability of our research to larger amounts of training data. While our results are mixed across feature selections, classifier hyperparameters, language pairs, and learning curves, we observe that including contextual features of the source sentence in general produces improvements. The most significant improvements involve the integration of long-distance contextual features, such as dependency relations in combination with part-of-speech tags in Dutch-to-English subtitle translation, the combination of dependency parse and semantic role information in English-to-Dutch parliamentary debate translation, or supertag features in English-to-Chinese translation.

[1]  Mark Steedman,et al.  The syntactic process , 2004, Language, speech, and communication.

[2]  David W. Aha,et al.  Instance-Based Learning Algorithms , 1991, Machine Learning.

[3]  Joakim Nivre,et al.  MaltParser: A Data-Driven Parser-Generator for Dependency Parsing , 2006, LREC.

[4]  Kevin Knight,et al.  11,001 New Features for Statistical Machine Translation , 2009, NAACL.

[5]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[6]  Ben Taskar,et al.  An End-to-End Discriminative Approach to Machine Translation , 2006, ACL.

[7]  Philip Koehn,et al.  Statistical Machine Translation , 2010, EAMT.

[8]  David Chiang,et al.  Hierarchical Phrase-Based Translation , 2007, CL.

[9]  Mark Steedman,et al.  Generative Models for Statistical Parsing with Combinatory Categorial Grammar , 2002, ACL.

[10]  Hermann Ney,et al.  Refined Lexicon Models for Statistical Machine Translation using a Maximum Entropy Approach , 2001, ACL.

[11]  George R. Doddington,et al.  Automatic Evaluation of Machine Translation Quality Using N-gram Co-Occurrence Statistics , 2002 .

[12]  Andy Way,et al.  Supertags as source language context in hierarchical phrase-based SMT , 2010, AMTA 2010.

[13]  Philippe Langlais,et al.  Prediction of Words in Statistical Machine Translation using a Multilayer Perceptron , 2009, MTSUMMIT.

[14]  Lucia Specia,et al.  n-Best Reranking for the Efficient Integration of Word Sense Disambiguation and Statistical Machine Translation , 2008, CICLing.

[15]  Hermann Ney,et al.  Discriminative Training and Maximum Entropy Models for Statistical Machine Translation , 2002, ACL.

[16]  Hermann Ney,et al.  Improvements in Phrase-Based Statistical Machine Translation , 2004, NAACL.

[17]  Noah A. Smith,et al.  Rich Source-Side Context for Statistical Machine Translation , 2008, WMT@ACL.

[18]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[19]  James R. Curran,et al.  The Importance of Supertagging for Wide-Coverage CCG Parsing , 2004, COLING.

[20]  Daphne Koller,et al.  Word-Sense Disambiguation for Machine Translation , 2005, HLT.

[21]  Xavier Carreras,et al.  Introduction to the CoNLL-2004 Shared Task: Semantic Role Labeling , 2004, CoNLL.

[22]  Marine Carpuat,et al.  Word Sense Disambiguation vs. Statistical Machine Translation , 2005, ACL.

[23]  L BergerAdam,et al.  A maximum entropy approach to natural language processing , 1996 .

[24]  Walter Daelemans,et al.  An efficient memory-based morphosyntactic tagger and parser for Dutch , 2007, CLIN 2007.

[25]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[26]  Roland Kuhn,et al.  Phrasetable Smoothing for Statistical Machine Translation , 2006, EMNLP.

[27]  John Cocke,et al.  A Statistical Approach to Machine Translation , 1990, CL.

[28]  Ralph Weischedel,et al.  A STUDY OF TRANSLATION ERROR RATE WITH TARGETED HUMAN ANNOTATION , 2005 .

[29]  Xavier Carreras,et al.  Introduction to the CoNLL-2005 Shared Task: Semantic Role Labeling , 2005, CoNLL.

[30]  Haizhou Li,et al.  Learning Translation Boundaries for Phrase-Based Decoding , 2010, NAACL.

[31]  Srinivas Bangalore,et al.  Statistical Machine Translation through Global Lexical Selection and Sentence Reconstruction , 2007, ACL.

[32]  Tong Zhang,et al.  A Discriminative Global Training Algorithm for Statistical MT , 2006, ACL.

[33]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[34]  Richard Johansson,et al.  The CoNLL 2008 Shared Task on Joint Parsing of Syntactic and Semantic Dependencies , 2008, CoNLL.

[35]  Hermann Ney,et al.  A Comparison of Alignment Models for Statistical Machine Translation , 2000, COLING.

[36]  Antal van den Bosch Wrapped progressive sampling search for optimizing learning algorithm parameters , 2005 .

[37]  Robert L. Mercer,et al.  A Statistical Approach to Sense Disambiguation in Machine Translation , 1991, HLT.

[38]  Lluís Màrquez i Villodre,et al.  Context-aware Discriminative Phrase Selection for Statistical Machine Translation , 2007, WMT@ACL.

[39]  Marine Carpuat,et al.  Improving Statistical Machine Translation Using Word Sense Disambiguation , 2007, EMNLP.

[40]  Daniel Marcu,et al.  Scalable Inference and Training of Context-Rich Syntactic Translation Models , 2006, ACL.

[41]  Srinivas Bangalore,et al.  Three models for discriminative machine translation using Global Lexical Selection and Sentence Reconstruction , 2007, SSST@HLT-NAACL.

[42]  Philip Resnik,et al.  Soft Syntactic Constraints for Hierarchical Phrased-Based Translation , 2008, ACL.

[43]  William J. Byrne,et al.  Context-Dependent Alignment Models for Statistical Machine Translation , 2009, NAACL.

[44]  Spyridon Matsoukas,et al.  Effective Use of Linguistic and Contextual Information for Statistical Machine Translation , 2009, EMNLP.

[45]  Salim Roukos,et al.  Direct Translation Model 2 , 2007, HLT-NAACL.

[46]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[47]  Andy Way,et al.  MaTrEx: The DCU MT System for WMT 2008 , 2008, WMT@ACL.

[48]  Noah A. Smith,et al.  Feature-Rich Translation by Quasi-Synchronous Lattice Parsing , 2009, EMNLP.

[49]  Matthew G. Snover,et al.  A Study of Translation Edit Rate with Targeted Human Annotation , 2006, AMTA.

[50]  Alon Lavie,et al.  METEOR: An Automatic Metric for MT Evaluation with High Levels of Correlation with Human Judgments , 2007, WMT@ACL.

[51]  Richard Johansson,et al.  Dependency-based Syntactic–Semantic Analysis with PropBank and NomBank , 2008, CoNLL.

[52]  Philipp Koehn,et al.  Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[53]  Walter Daelemans,et al.  A feature-relevance heuristic for indexing and compressing large case bases , 1997 .

[54]  Philipp Koehn,et al.  Statistical Significance Tests for Machine Translation Evaluation , 2004, EMNLP.

[55]  Srinivas Bangalore,et al.  Supertagging: An Approach to Almost Parsing , 1999, CL.

[56]  Chris Quirk,et al.  Dependency Treelet Translation: Syntactically Informed Phrasal SMT , 2005, ACL.

[57]  Hermann Ney,et al.  Improving Alignment Quality in Statistical Machine Translation Using Context-dependent Maximum Entropy Models , 2002, COLING.

[58]  Philippe Langlais,et al.  Explorations in using grammatical dependencies for contextual phrase translation disambiguation , 2008, EAMT.

[59]  Hermann Ney,et al.  Triplet Lexicon Models for Statistical Machine Translation , 2008, EMNLP.

[60]  Andy Way,et al.  MaTrEx: the DCU MT System for NTCIR-8 , 2010, NTCIR.

[61]  Hermann Ney,et al.  Extending Statistical Machine Translation with Discriminative and Trigger-Based Lexicon Models , 2009, EMNLP.

[62]  Andy Way,et al.  Dependency Relations as Source Context in Phrase-Based SMT , 2009, PACLIC.

[63]  Hwee Tou Ng,et al.  Word Sense Disambiguation Improves Statistical Machine Translation , 2007, ACL.

[64]  Walter Daelemans,et al.  IGTree: Using Trees for Compression and Classification in Lazy Learning Algorithms , 1997, Artificial Intelligence Review.

[65]  Adam L. Berger,et al.  A Maximum Entropy Approach to Natural Language Processing , 1996, CL.

[66]  Pascale Fung,et al.  Can Semantic Role Labeling Improve SMT? , 2009, EAMT.

[67]  Philipp Koehn,et al.  Pharaoh: A Beam Search Decoder for Phrase-Based Statistical Machine Translation Models , 2004, AMTA.

[68]  Andy Way,et al.  Exploiting source similarity for SMT using context-informed features , 2007, TMI.

[69]  Yanjun Ma,et al.  Using Supertags as Source Language Context in SMT , 2009, EAMT.

[70]  Srinivas Bangalore,et al.  Automated extraction of Tree-Adjoining Grammars from treebanks , 2006, Nat. Lang. Eng..

[71]  Julia Hockenmaier,et al.  Data and models for statistical parsing with combinatory categorial grammar , 2003 .

[72]  Walter Daelemans,et al.  Memory-Based Language Processing , 2009, Studies in natural language processing.

[73]  Jesús Giménez,et al.  Discriminative Phrase Selection for SMT , 2008 .