Source and Translation Classification using Most Frequent Words

Recently, translation scholars have made some general claims about translation properties. Some of these are source language independent while others are not. Koppel and Ordan (2011) performed empirical studies to validate both types of properties using English source texts and other texts translated into English. Obviously, corpora of this sort, which focus on a single language, are not adequate for claiming universality of translation properties. In this paper, we are validating both types of translation properties using original and translated texts from six European languages.

[1]  Deryle Lonsdale,et al.  A Frequency Dictionary of French: Core Vocabulary for Learners , 2009 .

[2]  Jaroslaw Kwapien,et al.  Linguistic complexity: English vs. Polish, text vs. corpus , 2010, ArXiv.

[3]  Mona Baker,et al.  Corpus-based Translation Studies: The Challenges that Lie Ahead , 1996 .

[4]  Moshe Koppel,et al.  Translationese and Its Dialects , 2011, ACL.

[5]  Nicoletta Calzolari,et al.  Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC-2014) , 2014, LREC 2014.

[6]  Mona Baker,et al.  'Corpus Linguistics and Translation Studies: Implications and Applications' , 1993 .

[7]  Diana Inkpen,et al.  Towards Simplification : A Supervised Learning Approach , 2010 .

[8]  Maeve Olohan Spelling Out the Optionals in Translation: a Corpus Study , 2001 .

[9]  Gideon Toury,et al.  Descriptive translation studies and beyond , 1995 .

[10]  Philipp Koehn,et al.  Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[11]  Nello Cristianini,et al.  Classification using String Kernels , 2000 .

[12]  Alexander Mehler,et al.  Customization of the Europarl Corpus for Translation Studies , 2012, LREC.

[13]  A. Pym,et al.  EXPLAINING EXPLICITATION , 2005 .

[14]  Robert Forkel,et al.  The World Atlas of Language Structures Online , 2009 .

[15]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[16]  Diana Inkpen,et al.  Identification of Translationese: A Machine Learning Approach , 2010, CICLing.

[17]  Mona Baker Corpus-Based Translation Studies* , 1996, Researching Translation in the Age of Technology and Global Conflict.

[18]  Ruslan Mitkov,et al.  Translation universals: do they exist? A corpus-based NLP study of convergence and simplification , 2008, AMTA.

[19]  Silvia Bernardini,et al.  A New Approach to the Study of Translationese: Machine-learning the Difference between Original and Translated Text , 2005, Lit. Linguistic Comput..

[20]  Hans van Halteren,et al.  Source Language Markers in EUROPARL Translations , 2008, COLING.

[21]  Andrew McCallum,et al.  A comparison of event models for naive bayes text classification , 1998, AAAI 1998.

[22]  Marius Popescu,et al.  Studying Translationese at the Character Level , 2011, RANLP.

[23]  Sattar Izwaini Building specialised corpora for translation studies , 2003 .

[24]  W. Bruce Croft Typology and Universals , 1990 .

[25]  Jason M. Brenier,et al.  Predictability Effects on Durations of Content and Function Words in Conversational English , 2009 .

[26]  Shuly Wintner,et al.  Language Models for Machine Translation: Original vs. Translated Texts , 2011, CL.

[27]  Sara Laviosa,et al.  Corpus-based Translation Studies: Theory, Findings, Applications , 2002 .