Linear Mixture Models for Robust Machine Translation

As larger and more diverse parallel texts become available, how can we leverage heterogeneous data to train robust machine translation systems that achieve good translation quality on various test domains? This challenge has been addressed so far by repurposing techniques developed for domain adaptation, such as linear mixture models which combine estimates learned on homogeneous subdomains. However, learning from large heterogeneous corpora is quite different from standard adaptation tasks with clear domain distinctions. In this paper, we show that linear mixture models can reliably improve translation quality in very heterogeneous training conditions, even if the mixtures do not use any domain knowledge and attempt to learn generic models rather than adapt them to the target domain. This surprising finding opens new perspectives for using mixture models in machine translation beyond clear cut domain adaptation tasks.

[1]  Rico Sennrich,et al.  Mixture-Modeling with Unsupervised Clusters for Domain Adaptation in Statistical Machine Translation , 2012, EAMT.

[2]  Jianhua Lin,et al.  Divergence measures based on the Shannon entropy , 1991, IEEE Trans. Inf. Theory.

[3]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[4]  M. I. Jordan Leo Breiman , 2011, 1101.0929.

[5]  Roland Kuhn,et al.  Discriminative Instance Weighting for Domain Adaptation in Statistical Machine Translation , 2010, EMNLP.

[6]  Barry Haddow,et al.  Applying Pairwise Ranked Optimisation to Improve the Interpolation of Translation Models , 2013, NAACL.

[7]  Christopher D. Manning,et al.  A Simple and Effective Hierarchical Phrase Reordering Model , 2008, EMNLP.

[8]  Alex Waibel,et al.  Adaptation of the translation model for statistical machine translation based on information retrieval , 2005, EAMT.

[9]  David Chiang,et al.  Hope and Fear for Discriminative Training of Statistical Translation Models , 2012, J. Mach. Learn. Res..

[10]  Alon Lavie,et al.  One System, Many Domains: Open-Domain Statistical Machine Translation via Feature Augmentation , 2012, AMTA.

[11]  Roland Kuhn,et al.  Mixture-Model Adaptation for SMT , 2007, WMT@ACL.

[12]  George F. Foster,et al.  Simulating Discriminative Training for Linear Mixture Adaptation in Statistical Machine Translation , 2013, MTSUMMIT.

[13]  Peng Xu,et al.  Improved Domain Adaptation for Statistical Machine Translation , 2012, AMTA.

[14]  Jianfeng Gao,et al.  Domain Adaptation via Pseudo In-Domain Data Selection , 2011, EMNLP.

[15]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[16]  Kevin Duh,et al.  Analysis of translation model adaptation in statistical machine translation , 2010, IWSLT.

[17]  Arianna Bisazza,et al.  Fill-up versus interpolation methods for phrase-based SMT adaptation , 2011, IWSLT.

[18]  Eiichiro Sumita,et al.  Bilingual Cluster Based Models for Statistical Machine Translation , 2007, EMNLP.

[19]  Philipp Koehn,et al.  Experiments in Domain Adaptation for Statistical Machine Translation , 2007, WMT@ACL.

[20]  Philipp Koehn,et al.  Analysing the Effect of Out-of-Domain Data on SMT Systems , 2012, WMT@NAACL-HLT.

[21]  George F. Foster,et al.  Unpacking and Transforming Feature Functions: New Ways to Smooth Phrase Tables , 2011, MTSUMMIT.