AnCora-CO: Coreferentially annotated corpora for Spanish and Catalan

This article describes the enrichment of the AnCora corpora of Spanish and Catalan (400 k each) with coreference links between pronouns (including elliptical subjects and clitics), full noun phrases (including proper nouns), and discourse segments. The coding scheme distinguishes between identity links, predicative relations, and discourse deixis. Inter-annotator agreement on the link types is 85–89% above chance, and we provide an analysis of the sources of disagreement. The resulting corpora make it possible to train and test learning-based algorithms for automatic coreference resolution, as well as to carry out bottom-up linguistic descriptions of coreference relations as they occur in real data.

[1]  Manuel Bertrán,et al.  AnCoraPipe: A tool for multilevel annotation , 2008, Proces. del Leng. Natural.

[2]  Kari Fraurud,et al.  Definiteness and the Processing of Noun Phrases in Natural Discourse , 1990, J. Semant..

[3]  Klaus Krippendorff,et al.  Answering the Call for a Standard Reliability Measure for Coding Data , 2007 .

[4]  Mitchell P. Marcus,et al.  OntoNotes: The 90% Solution , 2006, NAACL.

[5]  Michael Halliday,et al.  Cohesion in English , 1976 .

[6]  Xiaoqiang Luo,et al.  A Mention-Synchronous Coreference Resolution Algorithm Based On the Bell Tree , 2004, ACL.

[7]  Mark A. Przybocki,et al.  The Automatic Content Extraction (ACE) Program – Tasks, Data, and Evaluation , 2004, LREC.

[8]  Hwee Tou Ng,et al.  A Machine Learning Approach to Coreference Resolution of Noun Phrases , 2001, CL.

[9]  Massimo Poesio,et al.  Discourse Annotation and Semantic Annotation in the GNOME corpus , 2004, Proceedings of the 2004 ACL Workshop on Discourse Annotation - DiscAnnotation '04.

[10]  Shalom Lappin,et al.  An Algorithm for Pronominal Anaphora Resolution , 1994, CL.

[11]  Maite Taboada,et al.  Reference, Centers and Transitions in Spoken Spanish * , 2003 .

[12]  Ron Artstein,et al.  Survey Article: Inter-Coder Agreement for Computational Linguistics , 2008, CL.

[13]  Gemma Rigau i Oliver,et al.  Gramàtica del català contemporani , 2008 .

[14]  Rebecca J. Passonneau Computing Reliability for Coreference Annotation , 2004, LREC.

[15]  Sabine Braun,et al.  Corpus technology and language pedagogy : new resources, new tools, new methods , 2006 .

[16]  Conxita Lleó,et al.  Solà, Joan et al. (comps.) (2002): Gramàtica del català contemporani. Barcelona: Editorial Empúries. III: Sintaxi , 2004 .

[17]  Jeanette K. Gundel,et al.  Cognitive Status and the form of Referring Expressions in Discourse , 1993, The Oxford Handbook of Reference.

[18]  Mira Ariel Referring and accessibility , 1988, Journal of Linguistics.

[19]  Klaus Krippendorff,et al.  Content Analysis: An Introduction to Its Methodology , 1980 .

[20]  Claire Gardent,et al.  Improving Machine Learning Approaches to Coreference Resolution , 2002, ACL.

[21]  Wendy G. Lehnert,et al.  Using Decision Trees for Coreference Resolution , 1995, IJCAI.

[22]  Ruslan Mitkov,et al.  Robust Pronoun Resolution with Limited Knowledge , 1998, ACL.

[23]  Massimo Poesio,et al.  Bias decreases in proportion to the number of annotators , 2005 .

[24]  Manfred Stede,et al.  The Potsdam Commentary Corpus , 2004, ACL 2004.

[25]  Yannick Versley,et al.  SemEval-2010 Task 1: Coreference Resolution in Multiple Languages , 2009, *SEMEVAL.

[26]  Lynette Hirschman,et al.  Appendix F: MUC-7 Coreference Task Definition (version 3.0) , 1998, MUC.

[27]  Susanne Winkler,et al.  The fruits of empirical linguistics , 2009 .

[28]  S. Siegel,et al.  Nonparametric Statistics for the Behavioral Sciences , 2022, The SAGE Encyclopedia of Research Design.

[29]  Bonnie L. Webber,et al.  Discourse Deixis: Reference to Discourse Segments , 1988, ACL.

[30]  Kees van Deemter,et al.  On Coreferring: Coreference in MUC and Related Annotation Schemes , 2000, CL.

[31]  Rebecca J. Passonneau,et al.  Measuring Agreement on Set-valued Items (MASI) for Semantic and Pragmatic Annotation , 2006, LREC.

[32]  Xingchen Heng,et al.  Specifying and Verifying Cases Retrieval System Combining Event B and Spin , 2007 .

[33]  Philip N. Johnson-Laird,et al.  Thinking; Readings in Cognitive Science , 1977 .

[34]  Massimo Poesio,et al.  The MATE/GNOME Proposals for Anaphoric Annotation, Revisited , 2004, SIGDIAL Workshop.

[35]  Veronique Hoste,et al.  Optimization issues in machine learning of coreference resolution , 2005 .

[36]  Breck Baldwin,et al.  CogNIAC: high precision coreference with limited knowledge and linguistic resources , 1997 .

[37]  Saul A. Kripke,et al.  SPEAKER'S REFERENCE and SEMANTIC REFERENCE , 1977 .

[38]  Ron Artstein,et al.  Anaphoric Annotation in the ARRAU Corpus , 2008, LREC.

[39]  Adam Kilgarriff,et al.  95% Replicability for Manual Word Sense Tagging , 1999, EACL.

[40]  Sarah E. Blackwell Implicatures in Discourse: The Case of Spanish NP Anaphora , 2003 .

[41]  Annie Zaenen Mark-up Barking Up the Wrong Tree , 2006, Computational Linguistics.

[42]  Thomas S. Morton,et al.  Using Coreference for Question Answering , 1999, TREC.

[43]  Karel Jezek,et al.  Two uses of anaphora resolution in summarization , 2007, Inf. Process. Manag..

[44]  Michael Strube,et al.  Dialogue Acts, Synchronizing Units, and Anaphora Resolution , 2000, J. Semant..

[45]  I. Bosque,et al.  Gramática descriptiva de la lengua española , 1999 .

[46]  Ralph Weischedel,et al.  Unrestricted Coreference: Identifying Entities and Events in OntoNotes , 2007 .

[47]  Jerry R. Hobbs Resolving pronoun references , 1986 .

[48]  Mariona Taulé,et al.  AnCora: Multilevel Annotated Corpora for Catalan and Spanish , 2008, LREC.

[49]  Mariona Taulé,et al.  First-mention definites:More than exceptional cases , 2009 .

[50]  Renata Vieira,et al.  A Corpus-based Investigation of Definite Description Use , 1997, CL.

[51]  Jean Carletta,et al.  Assessing Agreement on Classification Tasks: The Kappa Statistic , 1996, CL.

[52]  Constantin Orasan,et al.  PALinkA: A highly customisable tool for discourse annotation , 2003, SIGDIAL Workshop.

[53]  Constantin Orasan,et al.  Anaphora Resolution Exercise: an Overview , 2008, LREC.