MEANTIME, the NewsReader Multilingual Event and Time Corpus

In this paper, we present the NewsReader MEANTIME corpus, a semantically annotated corpus of Wikinews articles. The corpus consists of 480 news articles, i.e. 120 English news articles and their translations in Spanish, Italian, and Dutch. MEANTIME contains annotations at different levels. The document-level annotation includes markables (e.g. entity mentions, event mentions, time expressions, and numerical expressions), relations between markables (modeling, for example, temporal information and semantic role labeling), and entity and event intra-document coreference. The corpus-level annotation includes entity and event cross-document coreference. Semantic annotation on the English section was performed manually; for the annotation in Italian, Spanish, and (partially) Dutch, a procedure was devised to automatically project the annotations on the English texts onto the translated texts, based on the manual alignment of the annotated elements; this enabled us not only to speed up the annotation process but also provided cross-lingual coreference. The English section of the corpus was extended with timeline annotations for the SemEval 2015 TimeLine shared task. The First CLIN Dutch Shared Task at CLIN26 was based on the Dutch section, while the EVALITA 2016 FactA (Event Factuality Annotation) shared task, based on the Italian section, is currently being organized

[1]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.

[2]  Maria Antònia Martí,et al.  AnCora-CO: Coreferentially annotated corpora for Spanish and Catalan , 2010, Lang. Resour. Evaluation.

[3]  Julio Gonzalo,et al.  The SemEval-2007 WePS Evaluation: Establishing a benchmark for the Web People Search Task , 2007, Fourth International Workshop on Semantic Evaluations (SemEval-2007).

[4]  James Pustejovsky,et al.  TimeML: Robust Specification of Event and Temporal Expressions in Text , 2003, New Directions in Question Answering.

[5]  Anne-Lyse Minard,et al.  Cross-language projection of multilayer semantic annotation in the NewsReader Wikinews Italian Corpus (WItaC) , 2015 .

[6]  Piek T. J. M. Vossen,et al.  Using a sledgehammer to crack a nut? Lexical diversity and event coreference resolution , 2014, LREC.

[7]  Olga Babko-Malaya,et al.  PropBank Annotation Guidelines , 2010 .

[8]  Tommaso Caselli,et al.  EVENTI: EValuation of Events and Temporal INformation at Evalita 2014 , 2014 .

[9]  Voula Giouli,et al.  Opinion and emotion in movies: a modular perspective to annotation , 2012 .

[10]  Sara Tonelli,et al.  CROMER: a Tool for Cross-Document Event and Entity Coreference , 2014, LREC.

[11]  Mirella Lapata,et al.  Cross-lingual Annotation Projection for Semantic Roles , 2009, J. Artif. Intell. Res..

[12]  James Pustejovsky,et al.  SemEval-2013 Task 1: TempEval-3: Evaluating Time Expressions, Events, and Temporal Relations , 2013, *SEMEVAL.

[13]  Emanuele Pianta,et al.  Exploiting parallel texts in the creation of multilingual semantically annotated resources: the MultiSemCor Corpus , 2005, Natural Language Engineering.

[14]  Dan Tufis,et al.  Romanian TimeBank: An Annotated Parallel Corpus for Temporal Information , 2012, LREC.

[15]  Mariona Taulé,et al.  AnCora: Multilevel Annotated Corpora for Catalan and Spanish , 2008, LREC.

[16]  Claudia Soria,et al.  Language Resources Production Models: the Case of the INTERA Multilingual Corpus and Terminology , 2006, LREC.

[17]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2002 Shared Task: Language-Independent Named Entity Recognition , 2002, CoNLL.

[18]  Kathrin Spreyer,et al.  Projection-based Acquisition of a Temporal Labeller , 2008, IJCNLP.

[19]  Eneko Agirre,et al.  SemEval-2015 Task 4: TimeLine: Cross-Document Event Ordering , 2015, *SEMEVAL.

[20]  Angie Williams,et al.  Introduction To The Colloquy , 2003 .

[21]  Tommaso Caselli,et al.  FacTA: Evaluation of Event Factuality and Temporal Anchoring , 2015 .

[22]  Valentina Bartalesi Lenzi,et al.  CAT: the CELCT Annotation Tool , 2012, LREC.

[23]  Richard Johansson,et al.  The CoNLL 2008 Shared Task on Joint Parsing of Syntactic and Semantic Dependencies , 2008, CoNLL.

[24]  Emanuele Pianta,et al.  I-CAB: the Italian Content Annotation Bank , 2006, LREC.

[25]  Chantal van Son,et al.  Hope and fear: Interpreting perspectives by integrating sentiment and event factuality , 2014, LREC 2014.

[26]  Anne-Lyse Minard,et al.  Event Factuality in Italian : Annotation of News Stories from the Ita-TimeBank , 2014 .

[27]  Philipp Koehn,et al.  Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[28]  Nelleke Oostdijk,et al.  The Construction of a 500-Million-Word Reference Corpus of Contemporary Written Dutch , 2013, Essential Speech and Language Technology for Dutch.