Normalization of Relative and Incomplete Temporal Expressions in Clinical Narratives

OBJECTIVE To improve the normalization of relative and incomplete temporal expressions (RI-TIMEXes) in clinical narratives. METHODS We analyzed the RI-TIMEXes in temporally annotated corpora and propose two hypotheses regarding the normalization of RI-TIMEXes in the clinical narrative domain: the anchor point hypothesis and the anchor relation hypothesis. We annotated the RI-TIMEXes in three corpora to study the characteristics of RI-TMEXes in different domains. This informed the design of our RI-TIMEX normalization system for the clinical domain, which consists of an anchor point classifier, an anchor relation classifier, and a rule-based RI-TIMEX text span parser. We experimented with different feature sets and performed an error analysis for each system component. RESULTS The annotation confirmed the hypotheses that we can simplify the RI-TIMEXes normalization task using two multi-label classifiers. Our system achieves anchor point classification, anchor relation classification, and rule-based parsing accuracy of 74.68%, 87.71%, and 57.2% (82.09% under relaxed matching criteria), respectively, on the held-out test set of the 2012 i2b2 temporal relation challenge. DISCUSSION Experiments with feature sets reveal some interesting findings, such as: the verbal tense feature does not inform the anchor relation classification in clinical narratives as much as the tokens near the RI-TIMEX. Error analysis showed that underrepresented anchor point and anchor relation classes are difficult to detect. CONCLUSIONS We formulate the RI-TIMEX normalization problem as a pair of multi-label classification problems. Considering only RI-TIMEX extraction and normalization, the system achieves statistically significant improvement over the RI-TIMEX results of the best systems in the 2012 i2b2 challenge.

[1]  Anna Rumshisky,et al.  Evaluating temporal relations in clinical text: 2012 i2b2 Challenge , 2013, J. Am. Medical Informatics Assoc..

[2]  Estela Saquete Boró,et al.  TIPSem (English and Spanish): Evaluating CRFs and Semantic Roles in TempEval-2 , 2010, *SEMEVAL.

[3]  Jun'ichi Tsujii,et al.  An end-to-end system to identify temporal relation in discharge summaries: 2012 i2b2 challenge , 2013, J. Am. Medical Informatics Assoc..

[4]  Nicoletta Ide Nancy Calzolari,et al.  Language Resources and Evaluation , 1966 .

[5]  Carlo Strapparava,et al.  Proceedings of the 5th International Workshop on Semantic Evaluation , 2010 .

[6]  James Pustejovsky,et al.  SemEval-2007 Task 15: TempEval Temporal Relation Identification , 2007, Fourth International Workshop on Semantic Evaluations (SemEval-2007).

[7]  Goran Nenadic,et al.  Combining rules and machine learning for extraction of temporal expressions and events from clinical narratives , 2013, J. Am. Medical Informatics Assoc..

[8]  Leon Derczynski,et al.  USFD2: Annotating Temporal Expresions and TLINKs for TempEval-2 , 2010, *SEMEVAL.

[9]  James F. Allen,et al.  TRIPS and TRIOS System for TempEval-2: Extracting Temporal Information from Text , 2010, *SEMEVAL.

[10]  Michael Gertz,et al.  HeidelTime: High Quality Rule-Based Extraction and Normalization of Temporal Expressions , 2010, *SEMEVAL.

[11]  Angel X. Chang,et al.  SUTime: A library for recognizing and normalizing time expressions , 2012, LREC.

[12]  Christopher D. Manning,et al.  Generating Typed Dependency Parses from Phrase Structure Parses , 2006, LREC.

[13]  Christopher G Chute,et al.  CNTRO: A Semantic Web Ontology for Temporal Relation Inferencing in Clinical Narratives. , 2010, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[14]  George Hripcsak,et al.  Fuzzy Temporal Constraint Networks for Clinical Information , 2008, AMIA.

[15]  George Hripcsak,et al.  A temporal constraint structure for extracting temporal information from clinical narrative , 2006, J. Biomed. Informatics.

[16]  Tommaso Caselli,et al.  SemEval-2010 Task 13: TempEval-2 , 2010, *SEMEVAL.

[17]  Anna Rumshisky,et al.  Temporal reasoning over clinical text: the state of the art , 2013, J. Am. Medical Informatics Assoc..

[18]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[19]  Cui Tao,et al.  Comprehensive temporal information detection from clinical text: medical events, time, and TLINK identification , 2013, J. Am. Medical Informatics Assoc..

[20]  Hua Xu,et al.  A hybrid system for temporal information extraction from clinical text , 2013, J. Am. Medical Informatics Assoc..

[21]  James Pustejovsky,et al.  ISO-TimeML: An International Standard for Semantic Annotation , 2010, LREC.

[22]  James F. Allen Towards a General Theory of Action and Time , 1984, Artif. Intell..

[23]  Vincent Ng,et al.  Classifying temporal relations in clinical data: A hybrid, knowledge-rich approach , 2013, J. Biomed. Informatics.

[24]  Hsinchun Chen,et al.  MedTime: A temporal information extraction system for clinical narratives , 2013, J. Biomed. Informatics.

[25]  Michael Gertz,et al.  Temporal Tagging on Different Domains: Challenges, Strategies, and Gold Standards , 2012, LREC.

[26]  Anna Rumshisky,et al.  Annotating temporal information in clinical narratives , 2013, J. Biomed. Informatics.

[27]  Robert Dale,et al.  WikiWars: A New Corpus for Research on Temporal Expressions , 2010, EMNLP.

[28]  Sanda M. Harabagiu,et al.  A flexible framework for recognizing events, temporal expressions, and temporal relations in clinical text , 2013, J. Am. Medical Informatics Assoc..

[29]  James Pustejovsky,et al.  SemEval-2013 Task 1: TempEval-3: Evaluating Time Expressions, Events, and Temporal Relations , 2013, *SEMEVAL.