Multilingual and cross-domain temporal tagging

Extraction and normalization of temporal expressions from documents are important steps towards deep text understanding and a prerequisite for many NLP tasks such as information extraction, question answering, and document summarization. There are different ways to express (the same) temporal information in documents. However, after identifying temporal expressions, they can be normalized according to some standard format. This allows the usage of temporal information in a term- and language-independent way. In this paper, we describe the challenges of temporal tagging in different domains, give an overview of existing annotated corpora, and survey existing approaches for temporal tagging. Finally, we present our publicly available temporal tagger HeidelTime, which is easily extensible to further languages due to its strict separation of source code and language resources like patterns and rules. We present a broad evaluation on multiple languages and domains on existing corpora as well as on a newly created corpus for a language/domain combination for which no annotated corpus has been available so far.

[1]  Nancy Chinchor,et al.  Overview of MUC-7 , 1998, MUC.

[2]  Robert Dale,et al.  The DANTE Temporal Expression Tagger , 2009, LTC.

[3]  Helmut Schmidt,et al.  Probabilistic part-of-speech tagging using decision trees , 1994 .

[4]  Michael Gertz,et al.  Temporal Information Retrieval: Challenges and Opportunities , 2011, TWAW.

[5]  Ralph Grishman,et al.  Design of the MUC-6 evaluation , 1995, MUC.

[6]  Max Mühlhäuser,et al.  Darmstadt Knowledge Processing Repository Based on UIMA , 2007 .

[7]  James Allan,et al.  Topic detection and tracking: event-based information organization , 2002 .

[8]  Tommaso Caselli,et al.  SemEval-2010 Task 13: TempEval-2 , 2010, *SEMEVAL.

[9]  Robert Dale,et al.  WikiWars: A New Corpus for Research on Temporal Expressions , 2010, EMNLP.

[10]  James Pustejovsky,et al.  Temporal Processing with the TARSQI Toolkit , 2008, COLING.

[11]  M. Gertz,et al.  WikiWarsDE : A German Corpus of Narratives Annotated with Temporal Expressions , 2011 .

[12]  Matteo Negri,et al.  Recognition and Normalization of TimeExpressions : ITC-irst at TERN 2004 , 2005 .

[13]  Branimir Boguraev,et al.  TimeBank-Driven TimeML Analysis , 2005, Annotating, Extracting and Reasoning about Time and Events.

[14]  Estela Saquete Boró,et al.  ID 392: TERSEO + T2T3 Transducer. A systems for Recognizing and Normalizing TIMEX3 , 2010, SemEval@ACL.

[15]  James Pustejovsky,et al.  TimeML: Robust Specification of Event and Temporal Expressions in Text , 2003, New Directions in Question Answering.

[16]  James F. Allen,et al.  Event and Temporal Expression Extraction from Raw Text: First Step towards a Temporally Aware System , 2010, Int. J. Semantic Comput..

[17]  Michael Gertz,et al.  An event-centric model for multilingual document similarity , 2011, SIGIR '11.

[18]  Michael Gertz,et al.  TimeTrails: A System for Exploring Spatio-Temporal Information in Documents , 2010, Proc. VLDB Endow..

[19]  James Pustejovsky,et al.  Temporal and Event Information in Natural Language Text , 2005, Lang. Resour. Evaluation.

[20]  Michael Gertz,et al.  HeidelTime: High Quality Rule-Based Extraction and Normalization of Temporal Expressions , 2010, *SEMEVAL.

[21]  Marie-Francine Moens,et al.  Meeting TempEval-2: Shallow Approach for Temporal Tagger , 2009, SEW@NAACL-HLT.

[22]  M. de Rijke,et al.  Extracting Temporal Information from Open Domain Text: A Comparative Exploration , 2005, J. Digit. Inf. Manag..

[23]  Estela Saquete,et al.  ID 392: TERSEO + T2T3 Transducer: a systems for recognizing and normalizing TIMEX3 , 2010 .

[24]  M. de Rijke,et al.  A Cascaded Machine Learning Approach to Interpreting Temporal Expressions , 2007, NAACL.

[25]  James Pustejovsky,et al.  Annotating, Extracting and Reasoning About Time and Events , 2005, Annotating, Extracting and Reasoning about Time and Events.

[26]  Helena Ahonen-Myka,et al.  Topic Detection and Tracking with Spatio-Temporal Evidence , 2003, ECIR.

[27]  N. Johnson The MITRE corporation , 1961, ACM National Meeting.

[28]  Matteo Negri,et al.  Evaluating Knowledge-based Approaches to the Multilingual Extension of a Temporal Expression Normalizer , 2006 .

[29]  Michael Gertz,et al.  Extraction and exploration of spatio-temporal information in documents , 2010, GIR.

[30]  Frank Schilder,et al.  From Temporal Expressions To Temporal Information: Semantic Tagging Of News Messages , 2001, The Language of Time - A Reader.

[31]  Ying Chen,et al.  Automatic Time Expression Labeling for English and Chinese Text , 2005, CICLing.

[32]  Michael Gertz,et al.  On the value of temporal information in information retrieval , 2007, SIGF.

[33]  Inderjeet Mani,et al.  Robust Temporal Processing of News , 2000, ACL.

[34]  António Branco,et al.  Temporal Information Processing of a New Language: Fast Porting with Minimal Resources , 2010, ACL.

[35]  Inderjeet Mani,et al.  2003 Standard for the Annotation of Temporal Expressions , 2004 .

[36]  M. de Rijke,et al.  Towards Task-Based Temporal Extraction and Recognition , 2005, Annotating, Extracting and Reasoning about Time and Events.