Towards a Literary Machine Translation: The Role of Referential Cohesion

What is the role of textual features above the sentence level in advancing the machine translation of literature? This paper examines how referential cohesion is expressed in literary and non-literary texts and how this cohesion affects translation. We first show in a corpus study on English that literary texts use more dense reference chains to express greater referential cohesion than news. We then compare the referential cohesion of machine versus human translations of Chinese literature and news. While human translators capture the greater referential cohesion of literature, Google translations perform less well at capturing literary cohesion. Our results suggest that incorporating discourse features above the sentence level is an important direction for MT research if it is to be applied to literature.

[1]  Martha Palmer,et al.  Pronominal anaphora resolution in chinese , 2006 .

[2]  Hwee Tou Ng,et al.  Identification and Resolution of Chinese Zero Pronouns: A Machine Learning Approach , 2007, EMNLP.

[3]  Heeyoung Lee,et al.  Stanford’s Multi-Pass Sieve Coreference Resolution System at the CoNLL-2011 Shared Task , 2011, CoNLL Shared Task.

[4]  Nianwen Xue,et al.  CoNLL-2011 Shared Task: Modeling Unrestricted Coreference in OntoNotes , 2011, CoNLL Shared Task.

[5]  Fang Kong,et al.  A Tree Kernel-Based Unified Framework for Chinese Zero Anaphora Resolution , 2010, EMNLP.

[6]  Raymond Chapman Linguistics and Literature , 1973 .

[7]  Chris Mellish,et al.  Evaluating Centering-Based Metrics of Coherence , 2004, ACL.

[8]  Sandra A. Thompson,et al.  Third-person pronouns and zero-anaphora in Chinese discourse , 1979 .

[9]  Jonathan Slocum,et al.  A Survey of Machine Translation: Its History, Current Status and Future Prospects , 1985, CL.

[10]  Kathleen McKeown,et al.  Extracting Social Networks from Literary Fiction , 2010, ACL.

[11]  Richard Power,et al.  Optimizing Referential Coherence in Text Generation , 2004, CL.

[12]  Hamidreza Hossein Mikhchi Standards of Textuality , 2011 .

[13]  Dan Klein,et al.  Simple Coreference Resolution with Rich Syntactic and Semantic Features , 2009, EMNLP.

[14]  Mirella Lapata,et al.  Modeling Local Coherence: An Entity-Based Approach , 2005, ACL.

[15]  Michael Halliday,et al.  Cohesion in English , 1976 .

[16]  B. Hatim,et al.  The Translator As Communicator , 1997 .

[17]  Young-Joo Kim,et al.  Subject/Object Drop in the Acquisition of Korean: A Cross-Linguistic Comparison , 2000 .

[18]  Mona Baker,et al.  Corpus-based Translation Studies: The Challenges that Lie Ahead , 1996 .

[19]  Douglas Biber,et al.  Variation across speech and writing: Methodology , 1988 .

[20]  Dan Klein,et al.  Coreference Resolution in a Modular, Entity-Centered Model , 2010, NAACL.

[21]  Graeme Hirst,et al.  Lexical Cohesion Computed by Thesaural relations as an indicator of the structure of text , 1991, CL.

[22]  Vincent Ng,et al.  Coreference Resolution with World Knowledge , 2011, ACL.

[23]  Barbara Di Eugenio,et al.  Centering: A Parametric Theory and Its Instantiations , 2004, Computational Linguistics.

[24]  Inderjeet Mani,et al.  Using Cohesion and Coherence Models for Text Summarization , 1998 .

[25]  Dan Roth,et al.  Understanding the Value of Features for Coreference Resolution , 2008, EMNLP.

[26]  Scott Weinstein,et al.  Centering: A Framework for Modeling the Local Coherence of Discourse , 1995, CL.

[27]  Mona Baker,et al.  REPORTING THAT IN TRANSLATED ENGLISH. EVIDENCE FOR SUBCONSCIOUS PROCESSES OF EXPLICITATION , 2000 .

[28]  Micha Elsner,et al.  Coreference-inspired Coherence Modeling , 2008, ACL.

[29]  R. Beaugrande,et al.  Introduction to text linguistics , 1981 .