Fuzzy-Match Repair Guided by Quality Estimation

Computer-aided translation tools based on translation memories are widely used to assist professional translators. A translation memory (TM) consists of a set of translation units (TU) made up of source- and target-language segment pairs. For the translation of a new source segment s', these tools search the TM and retrieve the TUs (s,t) whose source segments are more similar to s'. The translator then chooses a TU and edit the target segment t to turn it into an adequate translation of s'. Fuzzy-match repair (FMR) techniques can be used to automatically modify the parts of t that need to be edited. We describe a language-independent FMR method that first uses machine translation to generate, given s' and (s,t), a set of candidate fuzzy-match repaired segments, and then chooses the best one by estimating their quality. An evaluation on three different language pairs shows that the selected candidate is a good approximation to the best (oracle) candidate produced and is closer to reference translations than machine-translated segments and unrepaired fuzzy matches (t). In addition, a single quality estimation model trained on a mix of data from all the languages performs well on any of the languages used.

[1]  José B. Mariño,et al.  N-gram-based Machine Translation , 2006, CL.

[2]  Philipp Koehn,et al.  Convergence of Translation Memory and Statistical Machine Translation , 2010, JEC.

[3]  Qun Liu,et al.  Lexically Constrained Decoding for Sequence Generation Using Grid Beam Search , 2017, ACL.

[4]  Eleftherios Avramidis,et al.  Sentence-level ranking with quality estimation , 2013, Machine Translation.

[5]  Qun Liu,et al.  Combining Translation Memories and Syntax-Based SMT: Experiments with Real Industrial Data , 2016, EAMT.

[6]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[7]  Lynne Bowker,et al.  Computer-Aided Translation Technology: A Practical Introduction , 2002 .

[8]  Andreas Eisele,et al.  DGT-TM: A freely available Translation Memory in 22 languages , 2012, LREC.

[9]  Gilles Louppe,et al.  Learning to rank with extremely randomized trees , 2010, Yahoo! Learning to Rank Challenge.

[10]  Mikel L. Forcada,et al.  Using machine translation in computer-aided translation to suggest the target-side words to change , 2011, MTSUMMIT.

[11]  Josef van Genabith,et al.  Seeding Statistical Machine Translation with Translation Memory Output through Tree-Based Structural Alignment , 2010, SSST@COLING.

[12]  Lucia Specia,et al.  Multi-level Translation Quality Prediction with QuEst++ , 2015, ACL.

[13]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[14]  Gilles Louppe,et al.  Understanding variable importances in forests of randomized trees , 2013, NIPS.

[15]  Marc Dymetman,et al.  Dynamic Translation Memory: Using Statistical Machine Translation to Improve Translation Memory Fuzzy Matches , 2008, CICLing.

[16]  Alex Kulesza,et al.  Confidence Estimation for Machine Translation , 2004, COLING.

[17]  Miles Osborne,et al.  Statistical Machine Translation , 2010, Encyclopedia of Machine Learning and Data Mining.

[18]  Pierre Geurts,et al.  Extremely randomized trees , 2006, Machine Learning.

[19]  Andy Way,et al.  Recent Advances in Example-Based Machine Translation , 2004 .

[20]  Tom Vanallemeersch,et al.  M3TRA: integrating TM and MT for professional translators , 2018, EAMT.

[21]  N. L. Johnson,et al.  Linear Statistical Inference and Its Applications , 1966 .

[22]  Arda Tezcan,et al.  Neural Fuzzy Repair: Integrating Fuzzy Matches into Neural Machine Translation , 2019, ACL.

[23]  Yang Liu,et al.  A unified framework and models for integrating translation memory into phrase-based statistical machine translation , 2019, Comput. Speech Lang..

[24]  Michael J. Fischer,et al.  The String-to-String Correction Problem , 1974, JACM.

[25]  Andy Way,et al.  Using Example-Based MT to Support Statistical MT when Translating Homogeneous Data in a Resource-Poor Setting , 2011, EAMT.

[26]  Philipp Koehn,et al.  Findings of the 2014 Workshop on Statistical Machine Translation , 2014, WMT@ACL.

[27]  Mikel L. Forcada,et al.  Using Machine Translation to Provide Target-Language Edit Hints in Computer Aided Translation Based on Translation Memories , 2015, J. Artif. Intell. Res..

[28]  Nello Cristianini,et al.  Estimating the Sentence-Level Quality of Machine Translation Systems , 2009, EAMT.

[29]  P. Isabelle,et al.  Phrase-based Machine Translation in a Computer-assisted Translation Environment , 2009, MTSUMMIT.

[30]  Alex Waibel,et al.  Augmenting a statistical translation system with a translation memory , 2005, EAMT.

[31]  Francis M. Tyers,et al.  Apertium: a free/open-source platform for rule-based machine translation , 2011, Machine Translation.

[32]  Anna Samiotou,et al.  Automatic Translation Memory Fuzzy Match Post-Editing: A Step Beyond Traditional TM/MT Integration , 2004, LREC.

[33]  Yong Wang,et al.  Search Engine Guided Neural Machine Translation , 2018, AAAI.

[34]  Aixia Guo,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2014 .

[35]  D. Basak,et al.  Support Vector Regression , 2008 .

[36]  Lucia Specia,et al.  Exploiting Objective Annotations for Minimising Translation Post-editing Effort , 2011, EAMT.

[37]  Tina Paulsen Christensen,et al.  Translation-Memory (TM) Research: What Do We Know and How Do We Know It? , 2017 .

[38]  John E. Ortega,et al.  Fuzzy-match repair using black-box machine translation systems: what can be expected? , 2016, AMTA.

[39]  Achim Zeileis,et al.  BMC Bioinformatics BioMed Central Methodology article Conditional variable importance for random forests , 2008 .