Italian and Spanish Null Subjects. A Case Study Evaluation in an MT Perspective

Thanks to their rich morphology, Italian and Spanish allow pro-drop pronouns, i.e., non lexically-realized subject pronouns. Here we distinguish between two different types of null subjects: personal pro-drop and impersonal pro-drop. We evaluate the translation of these two categories into French, a non pro-drop language, using Its-2, a transfer-based system developed at our laboratory; and Moses, a statistical system. Three different corpora are used: two subsets of the Europarl corpus and a third corpus built using newspaper articles. Null subjects turn out to be quantitatively important in all three corpora, but their distribution varies depending on the language and the text genre though. From a MT perspective, translation results are determined by the type of pro-drop and the pair of languages involved. Impersonal pro-drop is harder to translate than personal pro-drop, especially for the translation from Italian into French, and a significant portion of incorrect translations consists of missing pronouns.

[1]  L. Rizzi Null objects in Italian and the theory of 'pro' , 1986 .

[2]  D. Inkpen,et al.  THE IMPACT OF ZERO PRONOMINAL ANAPHORA ON TRANSLATIONAL LANGUAGE : A STUDY ON ROMANIAN NEWSPAPERS , 2011 .

[3]  Daniel Gildea,et al.  Effects of Empty Categories on Machine Translation , 2010, EMNLP.

[4]  Yannick Versley,et al.  Anaphoric Annotation of Wikipedia and Blogs in the Live Memories Corpus , 2010, LREC.

[5]  Ruslan Mitkov,et al.  Automatic Anaphora Resolution: Limits, Impediments, and Ways Forward , 2002, PorTAL.

[6]  Luz Rello,et al.  A Comparative Study of Spanish Zero Pronoun Distribution , 2009 .

[7]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[8]  Yves Scherrer,et al.  Deep Linguistic Multilingual Translation and Bilingual Dictionaries , 2009, WMT@EACL.

[9]  Liliane Haegeman,et al.  Introduction to Government and Binding Theory , 1991 .

[10]  Maria Antònia Martí,et al.  AnCora-CO: Coreferentially annotated corpora for Spanish and Catalan , 2010, Lang. Resour. Evaluation.

[11]  Antonio Ferrandez,et al.  A Computational Approach to Zero-pronouns in Spanish , 2000, ACL 2000.

[12]  Tony McEnery,et al.  Chapter 2. Parallel and Comparable Corpora: What is Happening? , 2007 .

[13]  Philipp Koehn,et al.  Experiments in Domain Adaptation for Statistical Machine Translation , 2007, WMT@ACL.

[14]  Sharid Loáiciga,et al.  Improving machine translation of null subjects in Italian and Spanish , 2012, EACL.

[15]  Margaret Rogers,et al.  Incorporating corpora: The linguist and the translator , 2008 .

[16]  Philipp Koehn,et al.  Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.