Combining translation memories and statistical machine translation using sparse features

The combination of translation memories (TMs) and statistical machine translation (SMT) has been demonstrated to be beneficial. In this paper, we present a combination approach which integrates TMs into SMT by using sparse features extracted at run-time during decoding. These features can be used on both phrase-based SMT and syntax-based SMT. We conducted experiments on a publicly available English–French data set and an English–Spanish industrial data set. Our experimental results show that these features significantly improve our phrase-based and syntax-based SMT baselines on both language pairs.

[1]  Hermann Ney,et al.  The Alignment Template Approach to Statistical Machine Translation , 2004, CL.

[2]  Kevin Duh,et al.  Factored Language Models Tutorial , 2007 .

[3]  Marcin Junczys-Dowmunt,et al.  The United Nations Parallel Corpus v1.0 , 2016, LREC.

[4]  Hermann Ney,et al.  Discriminative Training and Maximum Entropy Models for Statistical Machine Translation , 2002, ACL.

[5]  Yifan He,et al.  Consistent Translation using Discriminative Learning - A Translation Memory-inspired Approach , 2011, ACL.

[6]  Andy Way,et al.  Transformation and Decomposition for Efficiently Implementing and Improving Dependency-to-String Model In Moses , 2014, SSST@EMNLP.

[7]  Alon Lavie,et al.  Better Hypothesis Testing for Statistical Machine Translation: Controlling for Optimizer Instability , 2011, ACL.

[8]  Alon Lavie,et al.  Meteor 1.3: Automatic Metric for Reliable Optimization and Evaluation of Machine Translation Systems , 2011, WMT@EMNLP.

[9]  Matthew G. Snover,et al.  A Study of Translation Edit Rate with Targeted Human Annotation , 2006, AMTA.

[10]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[11]  Qun Liu,et al.  A novel dependency-to-string model for statistical machine translation , 2011, EMNLP.

[12]  Philipp Koehn,et al.  Convergence of Translation Memory and Statistical Machine Translation , 2010, JEC.

[13]  Josef van Genabith,et al.  Integrating N-best SMT Outputs into a TM System , 2010, COLING.

[14]  Colin Cherry Improved Reordering for Phrase-Based Translation using Sparse Features , 2013, HLT-NAACL.

[15]  George F. Foster,et al.  Batch Tuning Strategies for Statistical Machine Translation , 2012, NAACL.

[16]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[17]  Marc Dymetman,et al.  Dynamic Translation Memory: Using Statistical Machine Translation to Improve Translation Memory Fuzzy Matches , 2008, CICLing.

[18]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[19]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[20]  F ChenStanley,et al.  An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[21]  Yifan He,et al.  Bridging SMT and TM with Translation Recommendation , 2010, ACL.

[22]  David Chiang,et al.  A Hierarchical Phrase-Based Model for Statistical Machine Translation , 2005, ACL.

[23]  Manuel Arcedillo,et al.  Living on the edge: productivity gain thresholds in machine translation evaluation metrics , 2015, MTSUMMIT.

[24]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[25]  Chengqing Zong,et al.  Integrating Translation Memory into Phrase-Based Machine Translation during Decoding , 2013, ACL.

[26]  Christopher D. Manning,et al.  A Simple and Effective Hierarchical Phrase Reordering Model , 2008, EMNLP.

[27]  Jeff A. Bilmes,et al.  Factored Language Models and Generalized Parallel Backoff , 2003, NAACL.