A flexible framework for recognizing events, temporal expressions, and temporal relations in clinical text

OBJECTIVE To provide a natural language processing method for the automatic recognition of events, temporal expressions, and temporal relations in clinical records. MATERIALS AND METHODS A combination of supervised, unsupervised, and rule-based methods were used. Supervised methods include conditional random fields and support vector machines. A flexible automated feature selection technique was used to select the best subset of features for each supervised task. Unsupervised methods include Brown clustering on several corpora, which result in our method being considered semisupervised. RESULTS On the 2012 Informatics for Integrating Biology and the Bedside (i2b2) shared task data, we achieved an overall event F1-measure of 0.8045, an overall temporal expression F1-measure of 0.6154, an overall temporal link detection F1-measure of 0.5594, and an end-to-end temporal link detection F1-measure of 0.5258. The most competitive system was our event recognition method, which ranked third out of the 14 participants in the event task. DISCUSSION Analysis reveals the event recognition method has difficulty determining which modifiers to include/exclude in the event span. The temporal expression recognition method requires significantly more normalization rules, although many of these rules apply only to a small number of cases. Finally, the temporal relation recognition method requires more advanced medical knowledge and could be improved by separating the single discourse relation classifier into multiple, more targeted component classifiers. CONCLUSIONS Recognizing events and temporal expressions can be achieved accurately by combining supervised and unsupervised methods, even when only minimal medical knowledge is available. Temporal normalization and temporal relation recognition, however, are far more dependent on the modeling of medical knowledge.

[1]  Christopher D. Manning,et al.  Generating Typed Dependency Parses from Phrase Structure Parses , 2006, LREC.

[2]  James F. Allen,et al.  TRIPS and TRIOS System for TempEval-2: Extracting Temporal Information from Text , 2010, *SEMEVAL.

[3]  James F. Allen,et al.  Temporal Evaluation , 2011, ACL.

[4]  D. Lindberg,et al.  The Unified Medical Language System , 1993, Methods of Information in Medicine.

[5]  Michael Gertz,et al.  Multilingual and cross-domain temporal tagging , 2012, Language Resources and Evaluation.

[6]  James H. Martin,et al.  CU-TMP: Temporal Relation Classification Using Syntactic and Semantic Features , 2007, Fourth International Workshop on Semantic Evaluations (SemEval-2007).

[7]  Yoshua Bengio,et al.  Word Representations: A Simple and General Method for Semi-Supervised Learning , 2010, ACL.

[8]  Josef Kittler,et al.  Floating search methods in feature selection , 1994, Pattern Recognit. Lett..

[9]  Shuying Shen,et al.  2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text , 2011, J. Am. Medical Informatics Assoc..

[10]  Beatrice Alex,et al.  Edinburgh-LTG: TempEval-2 System Description , 2010, *SEMEVAL.

[11]  Leon Derczynski,et al.  USFD2: Annotating Temporal Expresions and TLINKs for TempEval-2 , 2010, *SEMEVAL.

[12]  Anna Rumshisky,et al.  Evaluating temporal relations in clinical text: 2012 i2b2 Challenge , 2013, J. Am. Medical Informatics Assoc..

[13]  James Pustejovsky,et al.  The TempEval challenge: identifying temporal relations in text , 2009, Lang. Resour. Evaluation.

[14]  Robert J. Gaizauskas,et al.  USFD: Preliminary Exploration of Features and Classifiers for the TempEval-2007 Task , 2007, SemEval@ACL.

[15]  Estela Saquete Boró,et al.  ID 392: TERSEO + T2T3 Transducer. A systems for Recognizing and Normalizing TIMEX3 , 2010, SemEval@ACL.

[16]  Estela Saquete Boró,et al.  TIPSem (English and Spanish): Evaluating CRFs and Semantic Roles in TempEval-2 , 2010, *SEMEVAL.

[17]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[18]  Michael Gertz,et al.  HeidelTime: High Quality Rule-Based Extraction and Normalization of Temporal Expressions , 2010, *SEMEVAL.

[19]  Paloma Martínez,et al.  UC3M System: Determining the Extent, Type and Value of Time Expressions in TempEval-2 , 2010, SemEval@ACL.

[20]  Sophia Ananiadou,et al.  Developing a Robust Part-of-Speech Tagger for Biomedical Text , 2005, Panhellenic Conference on Informatics.

[21]  James Pustejovsky,et al.  Automating Temporal Annotation with TARSQI , 2005, ACL.

[22]  Özlem Uzuner,et al.  Extracting medication information from clinical text , 2010, J. Am. Medical Informatics Assoc..

[23]  James Pustejovsky,et al.  TempEval-3: Evaluating Events, Time Expressions, and Temporal Relations , 2012, ArXiv.

[24]  Yuji Matsumoto,et al.  NAIST.Japan: Temporal Relation Identification Using Dependency Parsed Tree , 2007, Fourth International Workshop on Semantic Evaluations (SemEval-2007).

[25]  William W. Cohen,et al.  Semi-Markov Conditional Random Fields for Information Extraction , 2004, NIPS.

[26]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[27]  Marie-Francine Moens,et al.  KUL: Recognition and Normalization of Temporal Expressions , 2010, SemEval@ACL.

[28]  Sanda M. Harabagiu,et al.  A flexible framework for deriving assertions from electronic medical records , 2011, J. Am. Medical Informatics Assoc..

[29]  Leon Derczynski,et al.  TIMEN: An Open Temporal Expression Normalisation Resource , 2012, LREC.

[30]  Munirathnam Srikanth,et al.  LCC-SRN: LCC's SRN System for SemEval 2007 Task 4 , 2007, SemEval@ACL.

[31]  Angel X. Chang,et al.  SUTime: A library for recognizing and normalizing time expressions , 2012, LREC.

[32]  James Pustejovsky,et al.  TimeML: Robust Specification of Event and Temporal Expressions in Text , 2003, New Directions in Question Answering.

[33]  Shuying Shen,et al.  Evaluating the state of the art in coreference resolution for electronic medical records , 2012, J. Am. Medical Informatics Assoc..

[34]  Tommaso Caselli,et al.  SemEval-2010 Task 13: TempEval-2 , 2010, *SEMEVAL.