On the features of translationese

Much research in translation studies indicates that translated texts are ontologically different from original non-translated ones. Translated texts, in any language, can be considered a dialect of that language, known as ‘translationese’. Several characteristics of translationese have been proposed as universal in a series of hypotheses. In this work, we test these hypotheses using a computational methodology that is based on supervised machine learning. We define several classifiers that implement various linguistically informed features, and assess the degree to which different sets of features can distinguish between translated and original texts. We demonstrate that some feature sets are indeed good indicators of translationese, thereby corroborating some hypotheses, whereas others perform much worse (sometimes at chance level), indicating that some ‘universal’ assumptions have to be reconsidered. In memoriam: Miriam Shlesinger, 1947–2012

[1]  Erich Steiner A register-based translation evaluation: An advertisement as a case in point , 1998 .

[2]  S. Blum,et al.  UNIVERSALS OF LEXICAL SIMPLIFICATION , 1978 .

[3]  Rita Vanderauwera Dutch Novels Translated Into English.The Transformation of a Minority Literature. , 1985 .

[4]  Moshe Koppel,et al.  Translationese and Its Dialects , 2011, ACL.

[5]  Juliane House,et al.  Interlingual and intercultural communication : discourse and cognition in translation and second language acquisition studies , 1986 .

[6]  Stefanie Wulff,et al.  Regression analysis in translation studies , 2012 .

[7]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[8]  N. Ben Ari The ambivalent case of repetitions in literary translation. Avoiding repetitions : A universal of translation ? , 1998 .

[9]  Dorothy Kenny,et al.  Lexis and creativity in translation : a corpus-based study , 2001 .

[10]  Michaela Albl,et al.  The Cognitive Basis of Translation , 1995 .

[11]  L. Øverås In Search of the Third Code: An Investigation of Norms in Literary Translation , 1998 .

[12]  Diana Inkpen,et al.  Identification of Translationese: A Machine Learning Approach , 2010, CICLing.

[13]  Nitsa Ben-Ari,et al.  The Ambivalent Case of Repetitions in Literary Translation. Avoiding Repetitions: a "Universal" of Translation? , 1998 .

[14]  Martin Gellerstam,et al.  Translationese in Swedish novels translated from English , 1986 .

[15]  Shuly Wintner,et al.  Adapting Translation Models to Translationese Improves SMT , 2012, EACL.

[16]  Mona Baker,et al.  'Corpus Linguistics and Translation Studies: Implications and Applications' , 1993 .

[17]  Muhammad Olatunde Yaqub,et al.  Text-types, Translation Types and Translation Assessment: A Case Study of Chapter 112 of the Holy Quran in Rodwell, Al-Hilali and Khan's Translation , 2014 .

[18]  Gideon Toury,et al.  Descriptive translation studies and beyond , 1995 .

[19]  Cyril Goutte,et al.  Automatic Detection of Translated Text and its Impact on Machine Translation , 2009, MTSUMMIT.

[20]  J. House,et al.  Shifts of Cohesion and Coherence in Translation , 1996 .

[21]  Sari Eskola Untypical frequencies in translated language , 2004 .

[22]  Shlomo Argamon,et al.  Computational methods in authorship attribution , 2009, J. Assoc. Inf. Sci. Technol..

[23]  Sandra L. Halverson The cognitive basis of translation universals , 2003 .

[24]  Mona Baker,et al.  REPORTING THAT IN TRANSLATED ENGLISH. EVIDENCE FOR SUBCONSCIOUS PROCESSES OF EXPLICITATION , 2000 .

[25]  Philipp Koehn,et al.  Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[26]  R. Harald Baayen,et al.  How Variable May a Constant be? Measures of Lexical Richness in Perspective , 1998, Comput. Humanit..

[27]  Kenneth Ward Church,et al.  Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[28]  Diana Inkpen,et al.  Translationese Traits in Romanian Newspapers: A Machine Learning Approach , 2011, Int. J. Comput. Linguistics Appl..

[29]  R. A. Redol,et al.  On grammatical translationese , 1995 .

[30]  Marius Popescu,et al.  Studying Translationese at the Character Level , 2011, RANLP.

[31]  R. Harald Baayen,et al.  Word Frequency Distributions , 2001 .

[32]  Elke Teich,et al.  Cross-linguistic variation in system and text : a methodology for the investigation of translations and comparable texts , 2003 .

[33]  Jack Grieve,et al.  Quantitative Authorship Attribution: An Evaluation of Techniques , 2007, Lit. Linguistic Comput..

[34]  Iustina-Narcisa Ilisei,et al.  A MACHINE LEARNING APPROACH TO THE IDENTIFICATION OF TRANSLATIONAL LANGUAGE: AN INQUIRY INTO TRANSLATIONESE LEARNING MODELS , 2012 .

[35]  S. Tirkkonen-Condit Translationese — a myth or an empirical fact?: A study into the linguistic identifiability of translated language , 2002 .

[36]  Anthony Pym,et al.  On Toury's laws of how translators translate , 2008 .

[37]  Jan Rybicki The great mystery of the (almost) invisible translator: Stylometry in translation , 2012 .

[38]  Shlomo Argamon,et al.  Scalability Issues in Authorship Attribution.Kim Luyckx , 2012, Lit. Linguistic Comput..

[39]  J. Munday A Computer-assisted Approach to the Analysis of Translation Shifts , 1998 .

[40]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[41]  A. Chesterman Beyond the particular , 2004 .

[42]  Silvia Bernardini,et al.  A New Approach to the Study of Translationese: Machine-learning the Difference between Original and Translated Text , 2005, Lit. Linguistic Comput..

[43]  Pekka Kujamäki,et al.  Translation universals: do they exist? , 2004 .

[44]  Omar Sheikh al-Shabab Interpretation and the language of translation : creativity and conventions in translation , 1996 .

[45]  Maeve Olohan How frequent are the contractions? A study of contracted forms in the Translational English Corpus , 2003 .

[46]  Shuly Wintner,et al.  Language Models for Machine Translation: Original vs. Translated Texts , 2011, CL.

[47]  Sara Laviosa,et al.  Corpus-based Translation Studies: Theory, Findings, Applications , 2002 .

[48]  Hans van Halteren,et al.  Source Language Markers in EUROPARL Translations , 2008, COLING.

[49]  D. Holmes A Stylometric Analysis of Mormon Scripture and Related Texts , 1992 .

[50]  Gideon Toury Interlanguage and its Manifestations in Translation. , 1979 .