论文信息 - Fuzzy-Match Repair Guided by Quality Estimation - 字舞流文

Fuzzy-Match Repair Guided by Quality Estimation

Computer-aided translation tools based on translation memories are widely used to assist professional translators. A translation memory (TM) consists of a set of translation units (TU) made up of source- and target-language segment pairs. For the translation of a new source segment s', these tools search the TM and retrieve the TUs (s,t) whose source segments are more similar to s'. The translator then chooses a TU and edit the target segment t to turn it into an adequate translation of s'. Fuzzy-match repair (FMR) techniques can be used to automatically modify the parts of t that need to be edited. We describe a language-independent FMR method that first uses machine translation to generate, given s' and (s,t), a set of candidate fuzzy-match repaired segments, and then chooses the best one by estimating their quality. An evaluation on three different language pairs shows that the selected candidate is a good approximation to the best (oracle) candidate produced and is closer to reference translations than machine-translated segments and unrepaired fuzzy matches (t). In addition, a single quality estimation model trained on a mix of data from all the languages performs well on any of the languages used.

John E Ortega | Mikel L Forcada | Felipe Sanchez-Martinez | M. Forcada | F. Sánchez-Martínez | J. Ortega

[1] José B. Mariño,et al. N-gram-based Machine Translation , 2006, CL.

[2] Philipp Koehn,et al. Convergence of Translation Memory and Statistical Machine Translation , 2010, JEC.

[3] Qun Liu,et al. Lexically Constrained Decoding for Sequence Generation Using Grid Beam Search , 2017, ACL.

[4] Eleftherios Avramidis,et al. Sentence-level ranking with quality estimation , 2013, Machine Translation.

[5] Qun Liu,et al. Combining Translation Memories and Syntax-Based SMT: Experiments with Real Industrial Data , 2016, EAMT.

[6] Wei-Yin Loh,et al. Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[7] Lynne Bowker,et al. Computer-Aided Translation Technology: A Practical Introduction , 2002 .

[8] Andreas Eisele,et al. DGT-TM: A freely available Translation Memory in 22 languages , 2012, LREC.

[9] Gilles Louppe,et al. Learning to rank with extremely randomized trees , 2010, Yahoo! Learning to Rank Challenge.

[10] Mikel L. Forcada,et al. Using machine translation in computer-aided translation to suggest the target-side words to change , 2011, MTSUMMIT.

[11] Josef van Genabith,et al. Seeding Statistical Machine Translation with Translation Memory Output through Tree-Based Structural Alignment , 2010, SSST@COLING.

[12] Lucia Specia,et al. Multi-level Translation Quality Prediction with QuEst++ , 2015, ACL.

[13] Philipp Koehn,et al. Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[14] Gilles Louppe,et al. Understanding variable importances in forests of randomized trees , 2013, NIPS.

[15] Marc Dymetman,et al. Dynamic Translation Memory: Using Statistical Machine Translation to Improve Translation Memory Fuzzy Matches , 2008, CICLing.

[16] Alex Kulesza,et al. Confidence Estimation for Machine Translation , 2004, COLING.

[17] Miles Osborne,et al. Statistical Machine Translation , 2010, Encyclopedia of Machine Learning and Data Mining.

[18] Pierre Geurts,et al. Extremely randomized trees , 2006, Machine Learning.

[19] Andy Way,et al. Recent Advances in Example-Based Machine Translation , 2004 .

[20] Tom Vanallemeersch,et al. M3TRA: integrating TM and MT for professional translators , 2018, EAMT.

[21] N. L. Johnson,et al. Linear Statistical Inference and Its Applications , 1966 .

[22] Arda Tezcan,et al. Neural Fuzzy Repair: Integrating Fuzzy Matches into Neural Machine Translation , 2019, ACL.

[23] Yang Liu,et al. A unified framework and models for integrating translation memory into phrase-based statistical machine translation , 2019, Comput. Speech Lang..

[24] Michael J. Fischer,et al. The String-to-String Correction Problem , 1974, JACM.

[25] Andy Way,et al. Using Example-Based MT to Support Statistical MT when Translating Homogeneous Data in a Resource-Poor Setting , 2011, EAMT.

[26] Philipp Koehn,et al. Findings of the 2014 Workshop on Statistical Machine Translation , 2014, WMT@ACL.

[27] Mikel L. Forcada,et al. Using Machine Translation to Provide Target-Language Edit Hints in Computer Aided Translation Based on Translation Memories , 2015, J. Artif. Intell. Res..

[28] Nello Cristianini,et al. Estimating the Sentence-Level Quality of Machine Translation Systems , 2009, EAMT.

[29] P. Isabelle,et al. Phrase-based Machine Translation in a Computer-assisted Translation Environment , 2009, MTSUMMIT.

[30] Alex Waibel,et al. Augmenting a statistical translation system with a translation memory , 2005, EAMT.

[31] Francis M. Tyers,et al. Apertium: a free/open-source platform for rule-based machine translation , 2011, Machine Translation.

[32] Anna Samiotou,et al. Automatic Translation Memory Fuzzy Match Post-Editing: A Step Beyond Traditional TM/MT Integration , 2004, LREC.

[33] Yong Wang,et al. Search Engine Guided Neural Machine Translation , 2018, AAAI.

[34] Aixia Guo,et al. Gene Selection for Cancer Classification using Support Vector Machines , 2014 .

[35] D. Basak,et al. Support Vector Regression , 2008 .

[36] Lucia Specia,et al. Exploiting Objective Annotations for Minimising Translation Post-editing Effort , 2011, EAMT.

[37] Tina Paulsen Christensen,et al. Translation-Memory (TM) Research: What Do We Know and How Do We Know It? , 2017 .

[38] John E. Ortega,et al. Fuzzy-match repair using black-box machine translation systems: what can be expected? , 2016, AMTA.

[39] Achim Zeileis,et al. BMC Bioinformatics BioMed Central Methodology article Conditional variable importance for random forests , 2008 .