Quantitative analysis of translation revision : contrastive corpus research on native English and Chinese translationese

Demand for Chinese-to-English translation has increased over recent years. In contrast, resources for training translators for Chinese-to-English are few although increasing now, relative to English-to-Chinese for example. Corpus-based techniques are now more widely acknowledged as being appropriate for the study of translation. A number of Chinese/English parallel translation corpora have been built and applied to the research of translation practice. While such corpus resources have made a significant impact on these research areas, they suffer from problems due to the skewed nature of translated text, or ‘translationese’. Obviously, translators and translation systems trained on these parallel corpora would inevitably inherit these features. Comparable corpora such as news articles, science and technology reports from the same period are more readily available. Studying translation revision carried out by native speakers of English may offer one way in to study Chinese-to-English translationese. However, very few quantitative studies of the products of the translation revision process have been carried out for any language pair. In this paper, we develop a framework using techniques from corpus linguistics, to enable the quantitative study of the translation revision process and describe the initial results we obtained. The research fits within a wider project to train language models in software tools that will assist in searching for non-native features of translated English texts.

[1]  Sara Laviosa 7. Corpora and the translator , 2003 .

[2]  Pius ten Hacken Computers and translation: a translator's guide , 2004 .

[3]  Roland Kuhn,et al.  Rule-Based Translation with Statistical Phrase-Based Post-Editing , 2007, WMT@ACL.

[4]  Geoffrey Leech,et al.  Corpus Annotation: Linguistic Information from Computer Text Corpora , 1997 .

[5]  Richard Xiao,et al.  Parallel and comparable corpora: What are they up to? , 2007 .

[6]  Philipp Koehn,et al.  Statistical Post-Editing on SYSTRAN‘s Rule-Based Translation System , 2007, WMT@ACL.

[7]  Bertus van Rooy,et al.  An evaluation of three POS taggers for the tagging of the Tswana Learner English Corpus , 2003 .

[8]  Sylviane Granger,et al.  Automatic Profiling of Learner Texts , 1998 .

[9]  Maeve Olohan,et al.  Introducing Corpora in Translation Studies , 2004 .

[10]  Tadaaki Tani,et al.  Feedback of correcting information in postediting to a machine translation system , 1988, COLING.

[11]  J. Munday A Computer-assisted Approach to the Analysis of Translation Shifts , 1998 .

[12]  Claire Y. Shih Revision from translators' point of view: an interview study. , 2006 .

[13]  Michael Grüninger,et al.  Introduction , 2002, CACM.

[14]  Hwee Tou Ng,et al.  Mining New Word Translations from Comparable Corpora , 2004, COLING.

[15]  Bogdan Babych,et al.  Extending the BLEU MT Evaluation Method with Frequency Weightings , 2004, ACL.

[16]  Alina Secar Translation Evaluation-a State of the Art Survey , 2006 .

[17]  Sara Laviosa,et al.  The corpus-based approach: a new paradigm in translation studies: a new paradigm in translation studies , 1998 .

[18]  Timothy Baldwin,et al.  Multiword Expressions: A Pain in the Neck for NLP , 2002, CICLing.

[19]  Brian Mossop,et al.  Revising and Editing for Translators , 2001 .

[20]  Roger Garside,et al.  A hybrid grammatical tagger: CLAWS4 , 1997 .

[21]  Michel Simard,et al.  Using cognates to align sentences in bilingual corpora , 1993, TMI.

[22]  Michel Simard,et al.  Statistical Phrase-Based Post-Editing , 2007, NAACL.

[23]  Hans P. Krings,et al.  Repairing Texts: Empirical Investigations of Machine Translation Post-Editing Processes , 2001 .

[24]  Paul Rayson,et al.  Sense and semantic tagging , 2008 .

[25]  Paul Rayson,et al.  Comparing Corpora using Frequency Profiling , 2000, Proceedings of the workshop on Comparing corpora -.

[26]  Paul Rayson Wmatrix : a web-based corpus processing environment , 2022 .

[27]  Paul Rayson,et al.  Automatic Extraction of Chinese Multiword Expressions with a Statistical Tool , 2006 .