Temporal expression extraction with extensive feature type selection and a posteriori label adjustment

The automatic extraction of temporal information from written texts is pivotal for many Natural Language Processing applications such as question answering, text summarisation and information retrieval. It allows to filter information and infer temporal flows of events.This paper presents ManTIME, a general domain temporal expression identification and normalisation system, and systematically explores the impact of different features and training corpora on the performance. The identification phase combines the use of conditional random fields along with a post-processing pipeline, whereas the normalisation phase is carried out using NorMA, an open-source rule-based temporal normaliser.We investigate the performance variation with respect to different feature types. Specifically, we show that the use of WordNet-based features in the identification task negatively affects the overall performance, and that there is no statistically significant difference in the results based on gazetteers, shallow parsing and propositional noun phrases labels on top of the morpho-lexical features. We also show that the use of silver data (alone or in addition to the human-annotated ones) does not improve the performance.We evaluate six combinations of training data and post-processing pipeline with respect to the TempEval-3 benchmark test set. The best run achieved 0.95 (precision), 0.85 (recall) and 0.90 (Fβ=1) in the identification phase. Normalisation accuracies are 0.86 (for type attribute) and 0.77 (for value attribute).The proposed approach ranked 3rd in the TempEval-3 challenge (task A) as the best performing machine learning-based system among 21 participants.

[1]  Tommaso Caselli,et al.  SemEval-2010 Task 13: TempEval-2 , 2010, *SEMEVAL.

[2]  Anna Rumshisky,et al.  Evaluating temporal relations in clinical text: 2012 i2b2 Challenge , 2013, J. Am. Medical Informatics Assoc..

[3]  Estela Saquete Boró,et al.  TIPSem (English and Spanish): Evaluating CRFs and Semantic Roles in TempEval-2 , 2010, *SEMEVAL.

[4]  Angel X. Chang,et al.  SUTime: A library for recognizing and normalizing time expressions , 2012, LREC.

[5]  Estela Saquete Boró,et al.  Applying semantic knowledge to the automatic processing of temporal expressions and events in natural language , 2013, Inf. Process. Manag..

[6]  Rafael Muñoz,et al.  Combining automatic acquisition of knowledge with machine learning approaches for multilingual temporal recognition and normalization , 2008, Inf. Sci..

[7]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[8]  Shuying Shen,et al.  2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text , 2011, J. Am. Medical Informatics Assoc..

[9]  Robert Dale,et al.  LTIMEX: Representing the local semantics of temporal expressions , 2011, 2011 Federated Conference on Computer Science and Information Systems (FedCSIS).

[10]  M. de Rijke,et al.  Towards Task-Based Temporal Extraction and Recognition , 2005, Annotating, Extracting and Reasoning about Time and Events.

[11]  James F. Allen,et al.  Event and Temporal Expression Extraction from Raw Text: First Step towards a Temporally Aware System , 2010, Int. J. Semantic Comput..

[12]  Alberto Lavelli,et al.  MulTiSEX - A Multi-language Timex Sequential Extractor , 2011, 2011 Eighteenth International Symposium on Temporal Representation and Reasoning.

[13]  M. F. Porter,et al.  An algorithm for suffix stripping , 1997 .

[14]  James F. Allen,et al.  Evaluating Temporal Information Understanding with Temporal Question Answering , 2012, 2012 IEEE Sixth International Conference on Semantic Computing.

[15]  M. de Rijke,et al.  A Cascaded Machine Learning Approach to Interpreting Temporal Expressions , 2007, NAACL.

[16]  James Pustejovsky,et al.  SemEval-2015 Task 6: Clinical TempEval , 2015, *SEMEVAL.

[17]  Michele Filannino Temporal expression normalisation in natural language texts , 2012, ArXiv.

[18]  Nate Chambers NavyTime: Event and Time Ordering from Raw Text , 2013, SemEval@NAACL-HLT.

[19]  Beatrice Alex,et al.  Edinburgh-LTG: TempEval-2 System Description , 2010, *SEMEVAL.

[20]  Helmut Schmidt,et al.  Probabilistic part-of-speech tagging using decision trees , 1994 .

[21]  Maarten de Rijke,et al.  Feature Engineering and Post-Processing for Temporal Expression Recognition Using Conditional Random Fields , 2005, ACL 2005.

[22]  Estela Saquete Boró,et al.  Using Semantic Networks to Identify Temporal Expressions from Semantic Roles , 2009, RANLP.

[23]  James Pustejovsky,et al.  TimeML: Robust Specification of Event and Temporal Expressions in Text , 2003, New Directions in Question Answering.

[24]  Patrice Enjalbert,et al.  A model for time granularity in natural language , 1998, Proceedings. Fifth International Workshop on Temporal Representation and Reasoning (Cat. No.98EX157).

[25]  Rafael Muñoz,et al.  Event ordering using TERSEO system , 2004, Data Knowl. Eng..

[26]  Inderjeet Mani,et al.  Temporal Granularity and Temporal Tagging of Text , 2000 .

[27]  Daniel Jurafsky,et al.  Parsing Time: Learning to Interpret Time Expressions , 2012, NAACL.

[28]  Michael Gertz,et al.  HeidelTime: High Quality Rule-Based Extraction and Normalization of Temporal Expressions , 2010, *SEMEVAL.

[29]  Leon Derczynski,et al.  TIMEN: An Open Temporal Expression Normalisation Resource , 2012, LREC.

[30]  James C. Lester,et al.  NCSU: Modeling Temporal Relations with Markov Logic and Lexical Ontology , 2010, *SEMEVAL.

[31]  Mihai Surdeanu,et al.  An Analysis of Bootstrapping for the Recognition of Temporal Expressions , 2009, HLT-NAACL 2009.

[32]  James Pustejovsky,et al.  SemEval-2007 Task 15: TempEval Temporal Relation Identification , 2007, Fourth International Workshop on Semantic Evaluations (SemEval-2007).

[33]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[34]  Chris D. Paice,et al.  Another stemmer , 1990, SIGF.

[35]  Goran Nenadic,et al.  Combining rules and machine learning for extraction of temporal expressions and events from clinical narratives , 2013, J. Am. Medical Informatics Assoc..

[36]  James Pustejovsky,et al.  SemEval-2013 Task 1: TempEval-3: Evaluating Time Expressions, Events, and Temporal Relations , 2013, *SEMEVAL.

[37]  Daniel S. Weld,et al.  Temporal Information Extraction , 2010, AAAI.