Don't Annotate, but Validate: a Data-to-Text Method for Capturing Event Data

In this paper, we present a new method to obtain large volumes of high-quality text corpora with event data for studying identity and reference relations. We report on the current methods to create event reference data by annotating texts and deriving the event data a posteriori. Our method starts from event registries in which event data is defined a priori. From this data, we extract so-called Microworlds of referential data with the Reference Texts that report on these events. This makes it possible to easily establish referential relations with high precision and at a large scale. In a pilot, we successfully obtained data from these resources with extreme ambiguity and variation, while maintaining the identity and reference relations and without having to annotate large quantities of texts word-by-word. The data from this pilot was annotated using an annotation tool created specifically in order to validate our method and to enrich the reference texts with event coreference annotations. This annotation process resulted in the Gun Violence Corpus, whose development process and outcome are described in this paper.

[1]  Piek T. J. M. Vossen,et al.  Using a sledgehammer to crack a nut? Lexical diversity and event coreference resolution , 2014, LREC.

[2]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[3]  Martha Palmer,et al.  Richer Event Description: Integrating event coreference with temporal, causal and bridging annotation , 2016 .

[4]  Jens Lehmann,et al.  The DBpedia Events Dataset , 2015, International Semantic Web Conference.

[5]  Charles J. Fillmore,et al.  The Structure of the Framenet Database , 2003 .

[6]  James Pustejovsky,et al.  SemEval-2010 Task 13: Evaluating Events, Time Expressions, and Temporal Relations (TempEval-2) , 2009, SEW@NAACL-HLT.

[7]  Ralph Grishman,et al.  The NomBank Project: An Interim Report , 2004, FCP@NAACL-HLT.

[8]  Yukari Yamakawa,et al.  Event Nugget Annotation: Processes and Issues , 2015, EVENTS@HLP-NAACL.

[9]  David Berthelot,et al.  WikiReading: A Novel Large-scale Language Understanding Task over Wikipedia , 2016, ACL.

[10]  Sanda M. Harabagiu,et al.  A Linguistic Resource for Discovering Event Structures and Resolving Event Coreference , 2008, LREC.

[11]  Teruko Mitamura,et al.  Supervised Within-Document Event Coreference using Information Propagation , 2014, LREC.

[12]  Sanda M. Harabagiu,et al.  Unsupervised Event Coreference Resolution with Rich Linguistic Features , 2010, ACL.

[13]  Chantal van Son,et al.  MEANTIME, the NewsReader Multilingual Event and Time Corpus , 2016, LREC.

[14]  Piek Vossen,et al.  ReferenceNet: a semantic-pragmatic network for capturing reference relations. , 2018, GWC.

[15]  Nicola Guarino,et al.  Avoiding IS-A Overloading: The Role of Identity Conditions in Ontology Design , 1999, Intelligent Information Integration.

[16]  James Pustejovsky,et al.  SemEval-2015 Task 6: Clinical TempEval , 2015, *SEMEVAL.

[17]  M. Felisa Verdejo,et al.  Events are Not Simple: Identity, Non-Identity, and Quasi-Identity , 2013, EVENTS@NAACL-HLT.

[18]  Valentina Bartalesi Lenzi,et al.  CAT: the CELCT Annotation Tool , 2012, LREC.

[19]  Piek T. J. M. Vossen,et al.  Semantic overfitting: what 'world' do we consider when evaluating disambiguation of text? , 2016, COLING.

[20]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[21]  Seth Kulick,et al.  From Light to Rich ERE: Annotation of Entities, Relations, and Events , 2015, EVENTS@HLP-NAACL.

[22]  Heng Ji,et al.  Building a Cross-document Event-Event Relation Corpus , 2016, LAW@ACL.

[23]  Jing Lu,et al.  Event Coreference Resolution with Multi-Pass Sieves , 2016, LREC.

[24]  Martha Palmer,et al.  From TreeBank to PropBank , 2002, LREC.

[25]  Martha Palmer,et al.  SemEval-2007 Task-17: English Lexical Sample, SRL and All Words , 2007, Fourth International Workshop on Semantic Evaluations (SemEval-2007).

[26]  Tommaso Caselli,et al.  VUACLTL at SemEval 2016 Task 12: A CRF Pipeline to Clinical TempEval , 2016, SemEval@NAACL-HLT.

[27]  Gerhard Weikum,et al.  Searching RDF Graphs with SPARQL and Keywords , 2010, IEEE Data Eng. Bull..

[28]  James Pustejovsky,et al.  TimeBank evolution as a community resource for TimeML parsing , 2007, Lang. Resour. Evaluation.

[29]  Heng Ji,et al.  Graph-based Event Coreference Resolution , 2009, Graph-based Methods for Natural Language Processing.

[30]  Heeyoung Lee,et al.  Joint Entity and Event Coreference Resolution across Documents , 2012, EMNLP.

[31]  Dan Roth,et al.  Event Detection and Co-reference with Minimal Supervision , 2016, EMNLP.

[32]  Piek T. J. M. Vossen,et al.  Identity and Granularity of Events in Text , 2016, CICLing.