A Quantitative Insight into the Impact of Translation on Readability

In this paper we investigate the impact of translation on readability. We propose a quantitative analysis of several shallow, lexical and morpho-syntactic features that have been traditionally used for assessing readability and have proven relevant for this task. We conduct our experiments on a parallel corpus of transcribed parliamentary sessions and we investigate readability metrics for the original segments of text, written in the language of the speaker, and their translations.

[1]  Kevyn Collins-Thompson Enriching Information Retrieval with Reading Level Prediction , 2011 .

[2]  Niko Wilbert,et al.  Modular Toolkit for Data Processing (MDP): A Python Data Processing Framework , 2008, Frontiers Neuroinformatics.

[3]  Walt Detmar Meurers,et al.  On Improving the Accuracy of Readability Classification using Insights from Second Language Acquisition , 2012, BEA@NAACL-HLT.

[4]  R. Burciaga Valdez,et al.  Are Condom Instructions in Spanish Readable? Implications for AIDS Prevention Activities for Hispanics , 1989 .

[5]  Luo Si,et al.  A statistical model for scientific readability , 2001, CIKM '01.

[6]  Emanuele Pianta,et al.  Making Readability Indices Readable , 2012, PITR@NAACL-HLT.

[7]  Samuel Reese,et al.  FreeLing 2.1: Five Years of Open-source Language Processing Tools , 2010, LREC.

[8]  Ani Nenkova,et al.  Revisiting Readability: A Unified Framework for Predicting Text Quality , 2008, EMNLP.

[9]  Francisco Casacuberta,et al.  Topology of Strings: Median String is NP-Complete , 1999, Theor. Comput. Sci..

[10]  Philipp Koehn,et al.  Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[11]  Lucia Specia,et al.  Readability Assessment for Text Simplification , 2010 .

[12]  Christina Schäffner,et al.  Politics, media and translation: exploring synergies , 2010 .

[13]  Jörg Tiedemann,et al.  Statistical Machine Translation with Readability Constraints , 2013, NODALIDA.

[14]  Weiguo Fan,et al.  Automatic summarization of search engine hit lists , 2000 .

[15]  Lluís Padró,et al.  FreeLing 1.3: Syntactic and semantic services in an open-source NLP library , 2006, LREC.

[16]  Hans van Halteren,et al.  Source Language Markers in EUROPARL Translations , 2008, COLING.

[17]  Simonetta Montemagni,et al.  READ–IT: Assessing Readability of Italian Texts with a View to Text Simplification , 2011, SLPAT.

[18]  Arthur C. Graesser,et al.  Coh-Metrix: Analysis of text on cohesion and language , 2004, Behavior research methods, instruments, & computers : a journal of the Psychonomic Society, Inc.

[19]  Lluís Padró,et al.  Analizadores Multilingües en FreeLing , 2011, Linguamática.

[20]  Thomas François,et al.  Do NLP and machine learning improve traditional readability formulas? , 2012, PITR@NAACL-HLT.

[21]  J. Chall,et al.  Readability revisited : the new Dale-Chall readability formula , 1995 .

[22]  Mihaela Bîrlădeanu,et al.  Vocabularul reprezentativ al limbilor romanice , 1988 .

[23]  Jörg Tiedemann,et al.  Parallel Data, Tools and Interfaces in OPUS , 2012, LREC.

[24]  Mari Ostendorf,et al.  A machine learning approach to reading level assessment , 2009, Comput. Speech Lang..

[25]  Maxine Eskénazi,et al.  Combining Lexical and Grammatical Features to Improve Readability Measures for First and Second Language Texts , 2007, NAACL.

[26]  G. Harry McLaughlin,et al.  SMOG Grading - A New Readability Formula. , 1969 .

[27]  Alexander Mehler,et al.  Customization of the Europarl Corpus for Translation Studies , 2012, LREC.

[28]  Liviu P. Dinu,et al.  On the Syllabic Similarities of Romance Languages , 2005, CICLing.

[29]  Mabel Crawford,et al.  The Art of Plain Talk , 1969 .

[30]  Xavier Carreras,et al.  FreeLing: An Open-Source Suite of Language Analyzers , 2004, LREC.

[31]  Lijun Feng,et al.  A Comparison of Features for Automatic Readability Assessment , 2010, COLING.

[32]  R. Gunning The Technique of Clear Writing. , 1968 .

[33]  Lluís Padró,et al.  FreeLing 3.0: Towards Wider Multilinguality , 2012, LREC.

[34]  M. Coleman,et al.  A computer readability formula designed for machine scoring. , 1975 .

[35]  Kevyn Collins-Thompson,et al.  A Language Modeling Approach to Predicting Reading Difficulty , 2004, NAACL.

[36]  Florin Manea,et al.  An efficient approach for the rank aggregation problem , 2006, Theor. Comput. Sci..

[37]  Lijun Feng,et al.  Automatic readability assessment for people with intellectual disabilities , 2009, ASAC.

[38]  E A Smith,et al.  Automated readability index. , 1967, AMRL-TR. Aerospace Medical Research Laboratories.

[39]  S. L'Vov,et al.  The Theory and Practice of Translation , 1965 .

[40]  Liviu P. Dinu,et al.  A Multi-Criteria Decision Method Based on Rank Distance , 2008, Fundam. Informaticae.

[41]  Noémie Elhadad,et al.  Mining a Lexicon of Technical Terms and Lay Equivalents , 2007, BioNLP@ACL.

[42]  Douglas A. Reynolds,et al.  Measuring human readability of machine generated text: three case studies in speech recognition and machine translation , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[43]  R. P. Fishburne,et al.  Derivation of New Readability Formulas (Automated Readability Index, Fog Count and Flesch Reading Ease Formula) for Navy Enlisted Personnel , 1975 .

[44]  Anna Trosborg,et al.  Text typology and translation , 1997 .