Automatic Extraction of Time Expressions Accross Domains in French Narratives

The prevalence of temporal references across all types of natural language utterances makes temporal analysis a key issue in Natural Language Processing. This work adresses three research questions: 1/is temporal expression recognition specific to a particular domain? 2/if so, can we characterize domain specificity? and 3/how can subdomain specificity be integrated in a single tool for unified temporal expression extraction? Herein, we assess temporal expression recognition from documents written in French covering three domains. We present a new corpus of clinical narratives annotated for temporal expressions , and also use existing corpora in the newswire and historical domains. We show that temporal expressions can be extracted with high performance across domains (best F-measure 0.96 obtained with a CRF model on clinical narratives). We argue that domain adaptation for the extraction of temporal expressions can be done with limited efforts and should cover pre-processing as well as temporal specific tasks.

[1]  Chen Lin,et al.  Temporal Annotation in the Clinical Domain , 2014, TACL.

[2]  James Pustejovsky,et al.  SemEval-2013 Task 1: TempEval-3: Evaluating Time Expressions, Events, and Temporal Relations , 2013, *SEMEVAL.

[3]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[4]  Pascal Denis,et al.  French TimeBank: An ISO-TimeML Annotated Reference Corpus , 2011, ACL.

[5]  Karin M. Verspoor,et al.  Annotating the biomedical literature for the human variome , 2013, Database J. Biol. Databases Curation.

[6]  François Yvon,et al.  Practical Very Large Scale CRFs , 2010, ACL.

[7]  Cyril Grouin,et al.  De-identification of clinical notes in French: towards a protocol for reference corpus development , 2014, J. Biomed. Informatics.

[8]  Sumithra Velupillai Temporal Expressions in Swedish Medical Text - A Pilot Study , 2014, BioNLP@ACL.

[9]  Sampo Pyysalo,et al.  brat: a Web-based Tool for NLP-Assisted Text Annotation , 2012, EACL.

[10]  Antske Fokkens,et al.  Offspring from Reproduction Problems: What Replication Failure Teaches Us , 2013, ACL.

[11]  Xavier Tannier,et al.  French Resources for Extraction and Normalization of Temporal Expressions with HeidelTime , 2014, LREC.

[12]  Anna Rumshisky,et al.  Annotating temporal information in clinical narratives , 2013, J. Biomed. Informatics.

[13]  Helmut Schmidt,et al.  Probabilistic part-of-speech tagging using decision trees , 1994 .

[14]  James Pustejovsky,et al.  SemEval-2015 Task 6: Clinical TempEval , 2015, *SEMEVAL.

[15]  Michael Gertz,et al.  Chinese Temporal Tagging with HeidelTime , 2014, EACL.

[16]  Anna Rumshisky,et al.  Evaluating temporal relations in clinical text: 2012 i2b2 Challenge , 2013, J. Am. Medical Informatics Assoc..

[17]  James Pustejovsky,et al.  SemEval-2007 Task 15: TempEval Temporal Relation Identification , 2007, Fourth International Workshop on Semantic Evaluations (SemEval-2007).

[18]  Dan Roth,et al.  Extraction of events and temporal expressions from clinical narratives , 2013, J. Biomed. Informatics.

[19]  Michael Gertz,et al.  Temporal Tagging on Different Domains: Challenges, Strategies, and Gold Standards , 2012, LREC.

[20]  Michael Gertz,et al.  HeidelTime: High Quality Rule-Based Extraction and Normalization of Temporal Expressions , 2010, *SEMEVAL.

[21]  James J. Masanz,et al.  Negation’s Not Solved: Generalizability Versus Optimizability in Clinical Natural Language Processing , 2014, PloS one.

[22]  Michael Gertz,et al.  Time for More Languages , 2014, ACM Trans. Asian Lang. Inf. Process..

[23]  James Pustejovsky,et al.  ISO-TimeML: An International Standard for Semantic Annotation , 2010, LREC.

[24]  Michael Gertz,et al.  Extending HeidelTime for Temporal Expressions Referring to Historic Dates , 2014, LREC.