KRAUTS: A German Temporally Annotated News Corpus

In recent years, temporal tagging, i.e., the extraction and normalization of temporal expressions, has become a vibrant research area. Several tools have been made available, and new strategies have been developed. Due to domain-specific challenges, evaluations of new methods should be performed on diverse text types. Despite significant efforts towards multilinguality in the context of temporal tagging, for all languages except English, annotated corpora exist only for a single domain. In the case of German, for example, only a narrativestyle corpus has been manually annotated so far, thus no evaluations of German temporal tagging performance on news articles can be made. In this paper, we present KRAUTS, a new German temporally annotated corpus containing two subsets of news documents: articles from the daily newspaper DOLOMITEN and from the weekly newspaper DIE ZEIT. Overall, the corpus contains 192 documents with 1,140 annotated temporal expressions, and has been made publicly available to further boost research in temporal tagging.

[1]  Tommaso Caselli,et al.  EVENTI: EValuation of Events and Temporal INformation at Evalita 2014 , 2014 .

[2]  Andreas Vlachos,et al.  Timeline extraction using distant supervision and joint inference , 2016, EMNLP.

[3]  James Pustejovsky,et al.  SemEval-2007 Task 15: TempEval Temporal Relation Identification , 2007, Fourth International Workshop on Semantic Evaluations (SemEval-2007).

[4]  James Pustejovsky,et al.  TimeML: Robust Specification of Event and Temporal Expressions in Text , 2003, New Directions in Question Answering.

[5]  Eneko Agirre,et al.  SemEval-2015 Task 4: TimeLine: Cross-Document Event Ordering , 2015, *SEMEVAL.

[6]  James Pustejovsky,et al.  SemEval-2015 Task 5: QA TempEval - Evaluating Temporal Information Understanding with Question Answering , 2015, *SEMEVAL.

[7]  Tommaso Caselli,et al.  SemEval-2010 Task 13: TempEval-2 , 2010, *SEMEVAL.

[8]  Andreas Spitz,et al.  Terms over LOAD: Leveraging Named Entities for Cross-Document Extraction and Summarization of Events , 2016, SIGIR.

[9]  M. Gertz,et al.  WikiWarsDE : A German Corpus of Narratives Annotated with Temporal Expressions , 2011 .

[10]  Michael Gertz,et al.  Extending HeidelTime for Temporal Expressions Referring to Historic Dates , 2014, LREC.

[11]  Alan Ritter,et al.  TweeTime : A Minimally Supervised Method for Recognizing and Normalizing Time Expressions in Twitter , 2016, EMNLP.

[12]  James Pustejovsky,et al.  SemEval-2017 Task 12: Clinical TempEval , 2017, *SEMEVAL.

[13]  Michael Gertz,et al.  A Baseline Temporal Tagger for all Languages , 2015, EMNLP.

[14]  Robert Dale,et al.  WikiWars: A New Corpus for Research on Temporal Expressions , 2010, EMNLP.

[15]  Paramita Mirza FBK-HLT-time : a complete Italian Temporal Processing system for EVENTI-Evalita 2014 , 2014 .

[16]  Estela Saquete Boró,et al.  TIPSem (English and Spanish): Evaluating CRFs and Semantic Roles in TempEval-2 , 2010, *SEMEVAL.

[17]  Gerhard Weikum,et al.  Time in Newspaper: A Large-Scale Analysis of Temporal Expressions in News Corpora , 2017 .

[18]  Michael Gertz,et al.  Temporal Information Retrieval , 2009, Encyclopedia of Database Systems.

[19]  Chantal van Son,et al.  MEANTIME, the NewsReader Multilingual Event and Time Corpus , 2016, LREC.

[20]  James Pustejovsky,et al.  SemEval-2013 Task 1: TempEval-3: Evaluating Time Expressions, Events, and Temporal Relations , 2013, *SEMEVAL.