Monolingual Marginal Matching for Translation Model Adaptation

When using a machine translation (MT) model trained on OLD-domain parallel data to translate NEW-domain text, one major challenge is the large number of out-of-vocabulary (OOV) and new-translation-sense words. We present a method to identify new translations of both known and unknown source language words that uses NEW-domain comparable document pairs. Starting with a joint distribution of source-target word pairs derived from the OLD-domain parallel corpus, our method recovers a new joint distribution that matches the marginal distributions of the NEW-domain comparable document pairs, while minimizing the divergence from the OLD-domain distribution. Adding learned translations to our French-English MT model results in gains of about 2 BLEU points over strong baselines.

[1]  Chris Callison-Burch,et al.  Creating Speech and Language Data With Amazon’s Mechanical Turk , 2010, Mturk@HLT-NAACL.

[2]  B. Mercier,et al.  A dual algorithm for the solution of nonlinear variational problems via finite element approximation , 1976 .

[3]  Kevin Knight,et al.  Large Scale Decipherment for Out-of-Domain Machine Translation , 2012, EMNLP-CoNLL.

[4]  Robert L. Mercer,et al.  Adaptive Language Modeling Using Minimum Discriminant Estimation , 1992, HLT.

[5]  D. Sontag 1 Introduction to Dual Decomposition for Inference , 2010 .

[6]  Rachel Rudinger,et al.  SenseSpotting: Never let your parallel data tie you to an old domain , 2013, ACL.

[7]  Alexander M. Fraser,et al.  Domain Adaptation in Machine Translation : Final Report , 2013 .

[8]  David Yarowsky,et al.  Translation discovery using diverse similarity measures , 2006 .

[9]  R. Glowinski,et al.  Sur l'approximation, par éléments finis d'ordre un, et la résolution, par pénalisation-dualité d'une classe de problèmes de Dirichlet non linéaires , 1975 .

[10]  Chris Callison-Burch,et al.  Supervised Bilingual Lexicon Induction with Multiple Monolingual Signals , 2013, NAACL.

[11]  Pascale Fung,et al.  Rare Word Translation Extraction from Aligned Comparable Documents , 2011, ACL.

[12]  Koby Crammer,et al.  Online Passive-Aggressive Algorithms , 2003, J. Mach. Learn. Res..

[13]  Andrew McCallum,et al.  Polylingual Topic Models , 2009, EMNLP.

[14]  Jörg Tiedemann,et al.  News from OPUS — A collection of multilingual parallel corpora with tools and interfaces , 2009 .

[15]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[16]  George F. Foster,et al.  Batch Tuning Strategies for Statistical Machine Translation , 2012, NAACL.

[17]  Reinhard Rapp,et al.  Automatic Identification of Word Translations from Unrelated English and German Corpora , 1999, ACL.

[18]  David Yarowsky,et al.  Inducing Translation Lexicons via Diverse Similarity Measures and Bridge Languages , 2002, CoNLL.

[19]  Oren Etzioni,et al.  Panlingual lexical translation via probabilistic inference , 2010, Artif. Intell..

[20]  Dan Klein,et al.  Learning Bilingual Lexicons from Monolingual Corpora , 2008, ACL.

[21]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[22]  Gholamreza Haffari,et al.  Graph Propagation for Paraphrasing Out-of-Vocabulary Words in Statistical Machine Translation , 2013, ACL.

[23]  Kevin Knight,et al.  Deciphering Foreign Language , 2011, ACL.

[24]  Tommi S. Jaakkola,et al.  Introduction to dual composition for inference , 2011 .

[25]  Kristina Toutanova,et al.  Extracting Parallel Sentences from Comparable Corpora using Document Level Alignment , 2010, NAACL.

[26]  Alexander J. Smola,et al.  Estimating labels from label proportions , 2008, ICML '08.

[27]  Stephen E. Robertson,et al.  Okapi at TREC-3 , 1994, TREC.

[28]  Stephen E. Robertson,et al.  GatfordCentre for Interactive Systems ResearchDepartment of Information , 1996 .

[29]  Dragos Stefan Munteanu,et al.  Measuring Machine Translation Errors in New Domains , 2013, TACL.

[30]  Marcello Federico,et al.  Efficient language model adaptation through MDI estimation , 1999, EUROSPEECH.

[31]  Pascale Fung,et al.  An IR Approach for Translating New Words from Nonparallel, Comparable Texts , 1998, ACL.

[32]  Reinhard Rapp,et al.  Identifying Word Translations in Non-Parallel Texts , 1995, ACL.

[33]  Hal Daumé,et al.  Domain Adaptation for Machine Translation by Mining Unseen Words , 2011, ACL.

[34]  Philipp Koehn,et al.  Learning a Translation Lexicon from Monolingual Corpora , 2002, ACL 2002.

[35]  Hwee Tou Ng,et al.  Improved Statistical Machine Translation for Resource-Poor Languages Using Related Resource-Rich Languages , 2009, EMNLP.

[36]  William D. Lewis,et al.  Intelligent Selection of Language Model Training Data , 2010, ACL.

[37]  Hermann Ney,et al.  Deciphering Foreign Language by Combining Language Models and Context Vectors , 2012, ACL.

[38]  David Y. W. Lee,et al.  Genres, Registers, Text Types, Domains and Styles: Clarifying the Concepts and Navigating a Path through the BNC Jungle , 2001 .