Towards Recognizing Phrase Translation Processes: Experiments on English-French

When translating phrases (words or group of words), human translators, consciously or not, resort to different translation processes apart from the literal translation, such as Idiom Equivalence, Generalization, Particularization, Semantic Modulation, etc. Translators and linguists (such as Vinay and Darbelnet, Newmark, etc.) have proposed several typologies to characterize the different translation processes. However, to the best of our knowledge, there has not been effort to automatically classify these fine-grained translation processes. Recently, an English-French parallel corpus of TED Talks has been manually annotated with translation process categories, along with established annotation guidelines. Based on these annotated examples, we propose an automatic classification of translation processes at subsentential level. Experimental results show that we can distinguish non-literal translation from literal translation with an accuracy of 87.09%, and 55.20% for classifying among five non-literal translation processes. This work demonstrates that it is possible to automatically classify translation processes. Even with a small amount of annotated examples, our experiments show the directions that we can follow in future work. One of our long term objectives is leveraging this automatic classification to better control paraphrase extraction from bilingual parallel corpora.

[1]  Ben Taskar,et al.  Alignment by Agreement , 2006, NAACL.

[2]  Nigel Collier,et al.  SemEval-2017 Task 2: Multilingual and Cross-lingual Semantic Word Similarity , 2017, *SEMEVAL.

[3]  lucía molina,et al.  Translation techniques revisited. A dynamic and functionalist approach , 2004 .

[4]  Michel Paillard,et al.  Approche linguistique des problèmes de traduction anglais -- français , 1989 .

[5]  Marine Carpuat,et al.  Detecting Cross-Lingual Semantic Divergence for Neural Machine Translation , 2017, NMT@ACL.

[6]  Jian Sun,et al.  Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Chris Callison-Burch,et al.  PPDB: The Paraphrase Database , 2013, NAACL.

[8]  Robyn Speer,et al.  ConceptNet at SemEval-2017 Task 2: Extending Word Embeddings with Multilingual Relational Knowledge , 2017, *SEMEVAL.

[9]  Ronan Collobert,et al.  Neural Network-based Word Alignment through Score Aggregation , 2016, WMT.

[10]  Marine Carpuat,et al.  Identifying Semantic Divergences in Parallel Text without Annotations , 2018, NAACL.

[11]  Malvina Nissim,et al.  Adding Semantics to Data-Driven Paraphrasing , 2015, ACL.

[12]  Chris Callison-Burch,et al.  Paraphrasing with Bilingual Parallel Corpora , 2005, ACL.

[13]  Helmut Schmid,et al.  Improvements in Part-of-Speech Tagging with an Application to German , 1999 .

[14]  Michael Carl,et al.  Why Translation Is Difficult: A Corpus-Based Study of Non-Literality in Post-Editing and From-Scratch Translation , 2017 .

[15]  Ralph Weischedel,et al.  A STUDY OF TRANSLATION ERROR RATE WITH TARGETED HUMAN ANNOTATION , 2005 .

[16]  Matthew G. Snover,et al.  A Study of Translation Edit Rate with Targeted Human Annotation , 2006, AMTA.

[17]  Catherine Havasi,et al.  ConceptNet 5.5: An Open Multilingual Graph of General Knowledge , 2016, AAAI.

[18]  Matteo Negri,et al.  Findings of the WMT 2018 Shared Task on Automatic Post-Editing , 2018, WMT.

[19]  François Yvon,et al.  Fixing Translation Divergences in Parallel Corpora for Neural MT , 2018, EMNLP.

[20]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[21]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[22]  Yuji Matsumoto,et al.  Automatic Construction of Machine Translation Knowledge Using Translation Literalness , 2003, EACL.

[23]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[24]  Jean-Paul Vinay,et al.  قراءة في كتاب stylistique comparée du français et de l'anglais méthode de traduction , 2018 .

[25]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[26]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[27]  Mirella Lapata,et al.  Paraphrasing Revisited with Neural Machine Translation , 2017, EACL.

[28]  Slav Petrov,et al.  A Universal Part-of-Speech Tagset , 2011, LREC.

[29]  Joakim Nivre,et al.  Benchmarking of Statistical Dependency Parsers for French , 2010, COLING.

[30]  Anne Vilnat,et al.  Construction of a Multilingual Corpus Annotated with Translation Relations , 2018 .

[31]  Grigorios Tsoumakas,et al.  Random K-labelsets for Multilabel Classification , 2022 .

[32]  R. Gray Entropy and Information Theory , 1990, Springer New York.

[33]  Nizar Habash,et al.  DUSTer: a method for unraveling cross-language divergences for statistical word-level alignment , 2002, AMTA.

[34]  Nianwen Xue,et al.  Translation Divergences in Chinese–English Machine Translation: An Empirical Investigation , 2017, CL.

[35]  Louis Dorn APPROACHES TO TRANSLATION (Language Teaching Methodology Senes). Peter Newmark. Oxford: Pergamon Press, 1981. Pp. 213. , 1985 .

[36]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[37]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[38]  P. Newmark A textbook of translation , 1988 .