论文信息 - Linguistic Features of Genre and Method Variation in Translation: A Computational Perspective - 字舞流文

Linguistic Features of Genre and Method Variation in Translation: A Computational Perspective

In this paper we describe the use of text classification methods to investigate genre and method variation in an English - German translation corpus. For this purpose we use linguistically motivated features representing texts using a combination of part-of-speech tags arranged in bigrams, trigrams, and 4-grams. The classification method used in this paper is a Bayesian classifier with Laplace smoothing. We use the output of the classifiers to carry out an extensive feature analysis on the main difference between genres and methods of translation.

Marcos Zampieri | Ekaterina Lapshinova-Koltunski | Marcos Zampieri | Ekaterina Lapshinova-Koltunski

[1] Barbara McGillivray,et al. Multivariate analyses of affix productivity in translated English , 2012 .

[2] Koen Plevoets,et al. Lexical lectometry in corpus-based translation studies: combining profile-based correspondence analysis and logistic regression modeling , 2012 .

[3] Marcos Zampieri,et al. VarClass: An Open-source Language Identification Tool for Language Varieties , 2014, LREC.

[4] Isabelle Delaere,et al. 7 Exploratory analysis of dimensions influencing variation in translation. The case of text register and translation method , 2017 .

[5] Michael Gamon,et al. A Machine Learning Approach to the Automatic Evaluation of Machine Translation , 2001, ACL.

[6] Mona Baker,et al. 'Corpus Linguistics and Translation Studies: Implications and Applications' , 1993 .

[7] Marcos Zampieri,et al. Automatic identification of language varieties: The case of Portuguese , 2012, KONVENS.

[8] Erich Steiner. Translated Texts: Properties, Variants, Evaluations , 2004 .

[9] Geoff Holmes,et al. Multinomial Naive Bayes for Text Categorization Revisited , 2004, Australian Conference on Artificial Intelligence.

[10] Elisabet Comelles,et al. VERTa participation in the WMT14 Metrics Task , 2014 .

[11] Erich Steiner,et al. Cross-Linguistic Corpora for the Study of Translations: Insights from the Language Pair English-German , 2012 .

[12] Juliane House,et al. Translation Quality Assessment: Past and Present , 2014 .

[13] Marcos Zampieri,et al. N-gram Language Models and POS Distribution for the Identification of Spanish Varieties (Ngrammes et Traits Morphosyntaxiques pour la Identification de Variétés de l’Espagnol) [in French] , 2013, JEP/TALN/RECITAL.

[14] Rico Sennrich,et al. TerrorCat: a Translation Error Categorization-based MT Quality Metric , 2012, WMT@NAACL-HLT.

[15] Meritxell Gonz. IPA and STOUT: Leveraging Linguistic and Source-based Features for Machine Translation Evaluation , 2014 .

[16] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[17] Josef van Genabith,et al. Re-assessing the WMT2013 Human Evaluation with Professional Translators Trainees , 2015, EAMT.

[18] Isabelle Delaere,et al. Applying a multidimensional, register-sensitive approach to visualize normalization in translated and non-translated Dutch , 2013 .

[19] Silvia Bernardini,et al. A New Approach to the Study of Translationese : Machine-learning the Difference between , 2006 .

[20] Mahmoud El-Haj,et al. Language Independent Evaluation of Translation Style and Consistency: Comparing Human and Machine Translations of Camus' Novel "The Stranger" , 2014, TSD.

[21] Josef van Genabith,et al. ReVal: A Simple and Effective Machine Translation Evaluation Metric Based on Recurrent Neural Networks , 2015, EMNLP.

[22] M. Halliday,et al. Language, Context, and Text: Aspects of Language in a Social-Semiotic Perspective , 1989 .

[23] D. Biber,et al. Longman Grammar of Spoken and Written English , 1999 .

[24] Khalil Sima'an,et al. BEER: BEtter Evaluation as Ranking , 2014, WMT@ACL.

[25] Lidun Hareide,et al. A multidimensional approach to aligned sentences in translated text , 2013 .

[26] Douglas Biber,et al. Dimensions of Register Variation: A Cross-Linguistic Comparison , 1995 .

[27] Mihaela Vela,et al. Measuring ‘Registerness’ in Human and Machine Translation: A Text Classification Approach , 2015, DiscoMT@EMNLP.

[28] Haidee Kruger,et al. Register and the features of translated language , 2012 .

[29] Elke Teich,et al. Cross-linguistic variation in system and text , 2003 .

[30] Dragos Stefan Munteanu,et al. Measuring Machine Translation Errors in New Domains , 2013, TACL.

[31] Hermann Ney,et al. Towards Automatic Error Analysis of Machine Translation Output , 2011, CL.

[32] Marcos Zampieri,et al. Investigating Genre and Method Variation in Translation Using Text Classification , 2015, TSD.

[33] Cyril Goutte,et al. Discriminating Similar Languages: Evaluations and Explorations , 2016, LREC.

[34] Douglas Biber,et al. Dimensions of Register Variation , 1995 .

[35] Peter Wittenburg,et al. Improving Native Language Identification with TF-IDF Weighting , 2013, BEA@NAACL-HLT.

[36] Ekaterina Lapshinova-Koltunski. VARTRA: A Comparable Corpus for Analysis of Translation Variation , 2013, BUCC@ACL.

[37] Alon Lavie,et al. Meteor Universal: Language Specific Translation Evaluation for Any Target Language , 2014, WMT@ACL.

[38] Helmut Schmidt,et al. Probabilistic part-of-speech tagging using decision trees , 1994 .

[39] Alexander Mehler,et al. Riding the Rough Waves of Genre on the Web , 2011, Genres on the Web.

[40] Liviu P. Dinu,et al. Temporal Text Ranking and Automatic Dating of Texts , 2014, EACL.

[41] Benjamin William Medlock,et al. Investigating classification for natural language processing tasks , 2008 .

[42] Chris Callison-Burch,et al. Using Comparable Corpora to Adapt MT Models to New Domains , 2014, WMT@ACL.

[43] Shervin Malmasi,et al. LTG at SemEval-2016 Task 11: Complex Word Identification with Classifier Ensembles , 2016, *SEMEVAL.

[44] David Y. W. Lee,et al. Genres, Registers, Text Types, Domains and Styles: Clarifying the Concepts and Navigating a Path through the BNC Jungle , 2001 .

[45] Chengqing Zong,et al. Domain Adaptation for Statistical Machine Translation with Domain Dictionary and Monolingual Corpora , 2008, COLING.

[46] Stella Neumann,et al. Contrastive Register Variation: A Quantitative Approach to the Comparison of English and German , 2013, Modern Language Review.

[47] Bogdan Babych,et al. Modelling Legitimate Translation Variation for Automatic Evaluation of MT Quality , 2004, LREC.

[48] Erich Steiner,et al. 5 A characterization of the resource based on shallow statistics , 2012 .

[49] Philipp Koehn,et al. Re-evaluating the Role of Bleu in Machine Translation Research , 2006, EACL.

[50] Stefan Evert,et al. A semi-supervised multivariate approach to the study of language variation , 2012 .

[51] Juliane House,et al. Translation quality assessment , 1977 .

[52] George R. Doddington,et al. Automatic Evaluation of Machine Translation Quality Using N-gram Co-Occurrence Statistics , 2002 .