Using Machine Translation to Provide Target-Language Edit Hints in Computer Aided Translation Based on Translation Memories

This paper explores the use of general-purpose machine translation (MT) in assisting the users of computer-aided translation (CAT) systems based on translation memory (TM) to identify the target words in the translation proposals that need to be changed (either replaced or removed) or kept unedited, a task we term as word-keeping recommendation. MT is used as a black box to align source and target sub-segments on the fly in the translation units (TUs) suggested to the user. Source-language (SL) and target-language (TL) segments in the matching TUs are segmented into overlapping sub-segments of variable length and machine-translated into the TL and the SL, respectively. The bilingual subsegments obtained and the matching between the SL segment in the TU and the segment to be translated are employed to build the features that are then used by a binary classifier to determine the target words to be changed and those to be kept unedited. In this approach, MT results are never presented to the translator. Two approaches are presented in this work: one using a word-keeping recommendation system which can be trained on the TM used with the CAT system, and a more basic approach which does not require any training. Experiments are conducted by simulating the translation of texts in several language pairs with corpora belonging to different domains and using three different MT systems. We compare the performance obtained to that of previous works that have used statistical word alignment for word-keeping recommendation, and show that the MT-based approaches presented in this paper are more accurate in most scenarios. In particular, our results confirm that the MT-based approaches are better than the alignment-based approach when using models trained on out-of-domain TMs. Additional experiments were also performed to check how dependent the MT-based recommender is on the language pair and MT system used for training. These experiments confirm a high degree of reusability of the recommendation models across various MT systems, but a low level of reusability across language pairs.

[1]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[2]  Andreas Eisele,et al.  DGT-TM: A freely available Translation Memory in 22 languages , 2012, LREC.

[3]  Andy Way,et al.  Example-Based Machine Translation via the Web , 2002, AMTA.

[4]  Yifan He,et al.  Consistent Translation using Discriminative Learning - A Translation Memory-inspired Approach , 2011, ACL.

[5]  Axel Funk,et al.  Die GNU General Public License, Version 3 , 2007 .

[6]  Daniel Marcu,et al.  Towards a Unified Approach to Memory- and Statistical-Based Machine Translation , 2001, ACL.

[7]  Elina Lagoudaki The Value of Machine Translation for the Professional Translator , 2008, AMTA.

[8]  Pius ten Hacken Computers and translation: a translator's guide , 2004 .

[9]  Philippe Langlais,et al.  Sub-sentential exploitation of translation memories , 2001, MTSUMMIT.

[10]  Stephen D. Richardson Machine Translation: From Research to Real Users , 2002, Lecture Notes in Computer Science.

[11]  Harold L. Somers,et al.  Review Article: Example-based Machine Translation , 1999, Machine Translation.

[12]  Martin Volk,et al.  Combining Statistical Machine Translation and Translation Memories with Domain Adaptation , 2013, NODALIDA.

[13]  Roland Kuhn,et al.  Mixture-Model Adaptation for SMT , 2007, WMT@ACL.

[14]  Jean Véronis,et al.  Evaluation of parallel text alignment systems , 2000 .

[15]  Ralph Grishman,et al.  A Multilingual Procedure for Dictionary-Based Sentence Alignment , 1998, AMTA.

[16]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[17]  Josef van Genabith,et al.  Seeding Statistical Machine Translation with Translation Memory Output through Tree-Based Structural Alignment , 2010, SSST@COLING.

[18]  Mikel L. Forcada,et al.  Using machine translation in computer-aided translation to suggest the target-side words to change , 2011, MTSUMMIT.

[19]  Yoshua Bengio,et al.  Algorithms for Hyper-Parameter Optimization , 2011, NIPS.

[20]  Hermann Ney,et al.  Word-Level Confidence Estimation for Machine Translation using Phrase-Based Translation Models , 2005, HLT.

[21]  Harold L. Somers,et al.  Computers and translation : a translator's guide , 2003 .

[22]  Michel Simard,et al.  Merging example-based and statistical machine translation: an experiment , 2002, AMTA.

[23]  Philipp Koehn,et al.  Explorer Edinburgh System Description for the 2005 IWSLT Speech Translation Evaluation , 2005 .

[24]  William J. Byrne,et al.  N-gram posterior probability confidence measures for statistical machine translation: an empirical study , 2012, Machine Translation.

[25]  Philipp Koehn,et al.  Convergence of Translation Memory and Statistical Machine Translation , 2010, JEC.

[26]  Lynne Bowker,et al.  Computer-Aided Translation Technology: A Practical Introduction , 2002 .

[27]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[28]  Tomaz Erjavec,et al.  The JRC-Acquis: A Multilingual Aligned Parallel Corpus with 20+ Languages , 2006, LREC.

[29]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[30]  Hermann Ney,et al.  HMM-Based Word Alignment in Statistical Translation , 1996, COLING.

[31]  Mei-Yuh Hwang,et al.  Incremental Training and Intentional Over-fitting of Word Alignment , 2011 .

[32]  Miles Osborne,et al.  Statistical Machine Translation , 2010, Encyclopedia of Machine Learning and Data Mining.

[33]  Marc Dymetman,et al.  Dynamic Translation Memory: Using Statistical Machine Translation to Improve Translation Memory Fuzzy Matches , 2008, CICLing.

[34]  Mikel L. Forcada,et al.  Using on-line available sources of bilingual information for word-level machine translation quality estimation , 2015, EAMT.

[35]  Hermann Ney,et al.  A DP based Search Algorithm for Statistical Machine Translation , 1998, ACL.

[36]  Stephan Vogel,et al.  Parallel Implementations of Word Alignment Tool , 2008, SETQALNLP.

[37]  Philipp Koehn,et al.  Findings of the 2014 Workshop on Statistical Machine Translation , 2014, WMT@ACL.

[38]  Mikel L. Forcada,et al.  A Simple Approach to Use Bilingual Information Sources for Word Alignment , 2012, Proces. del Leng. Natural.

[39]  Ignacio Garcia,et al.  Machines, translations and memories: language transfer in the web browser , 2012 .

[40]  SpeciaLucia,et al.  Machine translation evaluation versus quality estimation , 2010 .

[41]  Ilse Depraetere LEC Power Translator 12 , 2008 .

[42]  Ignacio Garcia Long term memories: Trados and TM turn 20 , 2005 .

[43]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[44]  Lucia Specia,et al.  Machine translation evaluation versus quality estimation , 2010, Machine Translation.

[45]  Mikel L. Forcada,et al.  Using word alignments to assist computer-aided translation users by marking which target-side words to change or keep unedited , 2011, EAMT.

[46]  Michel Simard,et al.  Translation Spotting for Translation Memories , 2003, ParallelTexts@NAACL-HLT.

[47]  Joaquín Adiego,et al.  Generalized Biwords for Bitext Compression and Translation Spotting , 2014, J. Artif. Intell. Res..

[48]  Julien Bourdaillet,et al.  TransSearch: from a bilingual concordancer to a translation finder , 2010, Machine Translation.

[49]  Jean V ronis Parallel Text Processing: Alignment and Use of Translation Corpora , 2002 .

[50]  P. Isabelle,et al.  Phrase-based Machine Translation in a Computer-assisted Translation Environment , 2009, MTSUMMIT.

[51]  Marcello Federico,et al.  Online Word Alignment for Online Adaptive Machine Translation , 2014, HaCaT@EACL.

[52]  Francis M. Tyers,et al.  Apertium: a free/open-source platform for rule-based machine translation , 2011, Machine Translation.

[53]  Anna Samiotou,et al.  Automatic Translation Memory Fuzzy Match Post-Editing: A Step Beyond Traditional TM/MT Integration , 2004, LREC.

[54]  Mikel L. Forcada,et al.  Using external sources of bilingual information for on-the-fly word alignment , 2012, ArXiv.