Anaphora Resolution with the ARRAU Corpus

The ARRAU corpus is an anaphorically annotated corpus of English providing rich linguistic information about anaphora resolution. The most distinctive feature of the corpus is the annotation of a wide range of anaphoric relations, including bridging references and discourse deixis in addition to identity (coreference). Other distinctive features include treating all NPs as markables, including non-referring NPs; and the annotation of a variety of morphosyntactic and semantic mention and entity attributes, including the genericity status of the entities referred to by markables. The corpus however has not been extensively used for anaphora resolution research so far. In this paper, we discuss three datasets extracted from the ARRAU corpus to support the three subtasks of the CRAC 2018 Shared Task–identity anaphora resolution over ARRAU-style markables, bridging references resolution, and discourse deixis; the evaluation scripts assessing system performance on those datasets; and preliminary results on these three tasks that may serve as baseline for subsequent research in these phenomena.

[1]  Yuchen Zhang,et al.  CoNLL-2012 Shared Task: Modeling Multilingual Unrestricted Coreference in OntoNotes , 2012, EMNLP-CoNLL Shared Task.

[2]  Graeme Hirst,et al.  Resolving Shell Nouns , 2014, EMNLP.

[3]  Michael Strube,et al.  Global Inference for Bridging Anaphora Resolution , 2013, NAACL.

[4]  Michael Strube,et al.  Which Coreference Evaluation Metric Do You Trust? A Proposal for a Link-based Entity Aware Metric , 2016, ACL.

[5]  John A. Hawkins,et al.  Definiteness and Indefiniteness , 2017 .

[6]  Stefanie Dipper,et al.  Annotating abstract anaphora , 2012, Lang. Resour. Evaluation.

[7]  Massimo Poesio,et al.  Domain-specific vs. Uniform Modeling for Coreference Resolution , 2012, LREC.

[8]  Graeme Hirst,et al.  Interpreting Anaphoric Shell Nouns using Antecedents of Cataphoric Shell Nouns as Training Data , 2013, EMNLP.

[9]  Natalia N. Modjeska,et al.  Resolving Other-Anaphora , 2004 .

[10]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[11]  James F. Allen,et al.  Resolving Demonstrative Anaphora in the TRAINS93 Corpus , 1998 .

[12]  Ron Artstein,et al.  Annotating (Anaphoric) Ambiguity , 2005 .

[13]  Xiaoqiang Luo,et al.  Scoring Coreference Partitions of Predicted Mentions: A Reference Implementation , 2014, ACL.

[14]  SAMEER S. PRADHAN,et al.  Ontonotes: a Unified Relational Semantic Representation , 2007, Int. J. Semantic Comput..

[15]  Massimo Poesio,et al.  Discourse Structure and Anaphora in Tutorial Dialogues: An Empirical Analysis of Two Theories of the Global Focus , 2006 .

[16]  Michael Strube,et al.  Collective Classification for Fine-grained Information Status , 2012, ACL.

[17]  Bonnie L. Webber,et al.  Structure and Ostension in the Interpretation of Discourse Deixis , 1991, ArXiv.

[18]  W. Chafe The Pear Stories: Cognitive, Cultural and Linguistic Aspects of Narrative Production , 1980 .

[19]  Guillaume Lample,et al.  Neural Architectures for Named Entity Recognition , 2016, NAACL.

[20]  Yannick Versley,et al.  Annotated Corpora and Annotation Tools , 2016, Anaphora Resolution - Algorithms, Resources, and Applications.

[21]  Massimo Poesio,et al.  Evalita 2011: Anaphora Resolution Task , 2011, EVALITA.

[22]  Kepa Joseba Rodríguez Resources for linguistically motivated Multilingual Anaphora Resolution , 2010 .

[23]  Renata Vieira,et al.  A Corpus-based Investigation of Definite Description Use , 1997, CL.

[24]  Ron Artstein,et al.  The Reliability of Anaphoric Annotation, Reconsidered: Taking Ambiguity into Account , 2005, FCA@ACL.

[25]  Hwee Tou Ng,et al.  A Machine Learning Approach to Coreference Resolution of Noun Phrases , 2001, CL.

[26]  Barbara Di Eugenio,et al.  Centering: A Parametric Theory and Its Instantiations , 2004, Computational Linguistics.

[27]  Sameer Pradhan,et al.  Unrestricted Coreference: Identifying Entities and Events in OntoNotes , 2007, International Conference on Semantic Computing (ICSC 2007).

[28]  Herbert H. Clark,et al.  Bridging , 1975, TINLAP.

[29]  G. Meade Building a Discourse-Tagged Corpus in the Framework of Rhetorical Structure Theory , 2001 .

[30]  Ron Artstein,et al.  Anaphoric Annotation in the ARRAU Corpus , 2008, LREC.

[31]  Jason Weston,et al.  Learning Anaphoricity and Antecedent Ranking Features for Coreference Resolution , 2015, ACL.

[32]  Yannick Versley,et al.  SemEval-2010 Task 1: Coreference Resolution in Multiple Languages , 2009, *SEMEVAL.

[33]  Heeyoung Lee,et al.  Deterministic Coreference Resolution Based on Entity-Centric, Precision-Ranked Rules , 2013, CL.

[34]  Costanza Navarretta,et al.  Anaphora Resolution in Danish , 2000 .

[35]  Yannick Versley,et al.  BART: A Modular Toolkit for Coreference Resolution , 2008, ACL.

[36]  Christoph Müller,et al.  Multi-level annotation of linguistic data with MMAX 2 , 2006 .

[37]  Christopher D. Manning,et al.  Improving Coreference Resolution by Learning Entity-Level Distributed Representations , 2016, ACL.

[38]  Daniel Gildea,et al.  The Proposition Bank: An Annotated Corpus of Semantic Roles , 2005, CL.

[39]  Anette Frank,et al.  A Mention-Ranking Model for Abstract Anaphora Resolution , 2017, EMNLP.

[40]  Massimo Poesio,et al.  Annotating a Corpus to Develop and Evaluate Discourse Entity Realization Algorithms: Issues and Preliminary Results , 2000, LREC.

[41]  Ina Roesiger Rule- and Learning-based Methods for Bridging Resolution in the ARRAU Corpus , 2018 .

[42]  Ron Artstein,et al.  Identifying reference to abstract objects in dialogue , 2006 .

[43]  Massimo Poesio,et al.  Discourse Annotation and Semantic Annotation in the GNOME corpus , 2004, Proceedings of the 2004 ACL Workshop on Discourse Annotation - DiscAnnotation '04.