Improving translation memory matching and retrieval using paraphrases

Most current translation memory (TM) systems work on the string level (character or word level) and lack semantic knowledge while matching. They use simple edit-distance (ED) calculated on the surface form or some variation on it (stem, lemma), which does not take into consideration any semantic aspects in matching. This paper presents a novel and efficient approach to incorporating semantic information in the form of paraphrasing (PP) in the ED metric. The approach computes ED while efficiently considering paraphrases using dynamic programming and greedy approximation. In addition to using automatic evaluation metrics like BLEU and METEOR, we have carried out an extensive human evaluation in which we measured post-editing time, keystrokes, HTER, HMETEOR, and carried out three rounds of subjective evaluations. Our results show that PP substantially improves TM matching and retrieval, resulting in translation performance increases when translators use paraphrase-enhanced TMs.

[1]  Ruslan Mitkov,et al.  Improving Translation Memory Matching through Clause Splitting , 2015 .

[2]  Philippe Langlais,et al.  Trans Type: Development-Evaluation Cycles to Boost Translator's Productivity , 2002, Machine Translation.

[3]  Lucia Specia,et al.  PET: a Tool for Post-editing and Assessing Machine Translation , 2012, LREC.

[4]  Dan Klein,et al.  Learning Accurate, Compact, and Interpretable Tree Annotation , 2006, ACL.

[5]  Lucia Specia,et al.  Assessing the Post-Editing Effort for Automatic and Semi-Automatic Translations of DVD Subtitles , 2011, RANLP.

[6]  Osamu Furuse,et al.  Formalizing translation memories , 1999, MTSUMMIT.

[7]  Josef van Genabith,et al.  Can Translation Memories afford not to use paraphrasing? , 2015, EAMT.

[8]  Atsushi Fujita,et al.  A Poor Man’s Translation Memory Using Machine Translation Evaluation Metrics , 2012, AMTA.

[9]  Osamu Furuse,et al.  FORMALIZING TRANSLATION MEMORY , 2003 .

[10]  M. Vela,et al.  Querying Multi-Layer Annotation and Alignment in Translation Corpora , 2007 .

[11]  Matthew G. Snover,et al.  A Study of Translation Edit Rate with Targeted Human Annotation , 2006, AMTA.

[12]  Andy Way,et al.  Facilitating Translation Using Source Language Paraphrase Lattices , 2010, EMNLP.

[13]  Harold L. Somers,et al.  Evaluation metrics for a translation memory system , 1999 .

[14]  Chris Callison-Burch,et al.  PPDB: The Paraphrase Database , 2013, NAACL.

[15]  Lucia Specia,et al.  Post-editing time as a measure of cognitive effort , 2012, AMTA.

[16]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[17]  Pius ten Hacken Computers and translation: a translator's guide , 2004 .

[18]  Graham Neubig,et al.  Searching Translation Memories for Paraphrases , 2011, MTSUMMIT.

[19]  Andreas Eisele,et al.  DGT-TM: A freely available Translation Memory in 22 languages , 2012, LREC.

[20]  Graham Russell,et al.  What’s been forgotten in translation memory , 2000, AMTA.

[21]  Gábor Pohl,et al.  MetaMorpho TM: a linguistically enriched translation memory , 2005 .

[22]  Masao Utiyama,et al.  Paraphrase Lattice for Statistical Machine Translation , 2010, ACL.

[23]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[24]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[25]  Alon Lavie,et al.  Meteor Universal: Language Specific Translation Evaluation for Any Target Language , 2014, WMT@ACL.

[26]  Mihaela Vela,et al.  Quantifying the Influence of MT Output in the Translators’ Performance: A Case Study in Technical Translation , 2014, HaCaT@EACL.

[27]  Philipp Koehn,et al.  Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[28]  R. Gupta,et al.  Incorporating paraphrasing in translation memory matching and retrieval , 2014, EAMT.