Temporal indexing of medical entity in Chinese clinical notes

BackgroundThe goal of temporal indexing is to select an occurred time or time interval for each medical entity in clinical notes, so that all medical entities can be indexed on a united timeline, which could assist the understanding of clinical notes and the further application of medical entities. Some temporal relation shared tasks for the medical entity in English clinical notes have been organized in the past few years, such as the 2012 i2b2 NLP challenge, 2015 and 2016 clinical TempEval challenges. In these tasks, many heuristics rule-based and machine learning-based systems have been developed. In recent years, the deep neural network models have shown great potential on many problems including the relation classification.MethodsIn this paper, we propose a recurrent convolutional neural network (RNN-CNN) model for the temporal indexing task, which consists of four layers: input layer – generates representation for each context word of medical entities or temporal expressions; LSTM (long-short term memory) layer – learns the context information of each word in a sentence and outputs a new word representation sequence; CNN layer – extracts meaningful features from a sentence and outputs a new representation for medical entity or temporal expression; Output layer – takes the representations of medical entity, temporal expression and relation features as input and classifies the temporal relation. Finally, the time or time interval for each medical entity can be directly selected according to the probability of each temporal relation predicted by above model.ResultsTo investigate the performance of our RNN-CNN model for the temporal indexing task, several baseline methods were also employed, such as the rule-based, support vector machine (SVM), convolutional neural network (CNN) and recurrent neural network (RNN) methods. Experiments conducted on a manually annotated corpus (including 563 clinical notes with 12,611 medical entities and 4006 temporal expressions) show that RNN-CNN model achieves the best F1-score of 75.97% for temporal relation classification and the best accuracy of 71.96% for temporal indexing.ConclusionsNeural network methods perform much better than the traditional rule-based and SVM-based method, which can capture more semantic information from the context of medical entities and temporal expressions. Besides, all our methods perform much better for the accurate time indexing than the time interval indexing, so how to improve the performance for time interval indexing will be the main focus in our future work.

[1]  Chen Lin,et al.  Neural Temporal Relation Extraction , 2017, EACL.

[2]  Anna Rumshisky,et al.  Evaluating temporal relations in clinical text: 2012 i2b2 Challenge , 2013, J. Am. Medical Informatics Assoc..

[3]  Ruihong Huang,et al.  A Sequential Model for Classifying Temporal Relations between Intra-Sentence Events , 2017, EMNLP.

[4]  Marti A. Hearst Trends & Controversies: Support Vector Machines , 1998, IEEE Intell. Syst..

[5]  Jason Alan Fries Brundlefly at SemEval-2016 Task 12: Recurrent Neural Networks vs. Joint Inference for Clinical Temporal Information Extraction , 2016, *SEMEVAL.

[6]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[7]  Yung-Chun Chang,et al.  TEMPTING system: A hybrid method of rule and machine learning for temporal relation extraction in patient discharge summaries , 2013, J. Biomed. Informatics.

[8]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[9]  Jun'ichi Tsujii,et al.  An end-to-end system to identify temporal relation in discharge summaries: 2012 i2b2 challenge , 2013, J. Am. Medical Informatics Assoc..

[10]  Chen Lin,et al.  Representations of Time Expressions for Temporal Relation Extraction with Convolutional Neural Networks , 2017, BioNLP.

[11]  Yaoyun Zhang,et al.  UTHealth at SemEval-2016 Task 12: an End-to-End System for Temporal Information Extraction from Clinical Notes , 2016, *SEMEVAL.

[12]  Olivier Ferret,et al.  Neural Architecture for Temporal Relation Extraction: A Bi-LSTM Approach for Detecting Narrative Containers , 2017, ACL.

[13]  Joel D. Martin,et al.  À la Recherche du Temps Perdu: extracting temporal relations from medical text in the 2012 i2b2 NLP challenge , 2013, J. Am. Medical Informatics Assoc..

[14]  William Stafford Noble,et al.  Support vector machine , 2013 .

[15]  Hua Xu,et al.  A hybrid system for temporal information extraction from clinical text , 2013, J. Am. Medical Informatics Assoc..

[16]  Shan Suthaharan,et al.  Support Vector Machine , 2016 .

[17]  Yusuke Miyao,et al.  Classifying Temporal Relations by Bidirectional LSTM over Dependency Paths , 2017, ACL.

[18]  Yoshua Bengio,et al.  Convolutional networks for images, speech, and time series , 1998 .

[19]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[20]  Stéphane M. Meystre,et al.  UtahBMI at SemEval-2016 Task 12: Extracting Temporal Information from Clinical Text , 2016, *SEMEVAL.

[21]  James Pustejovsky,et al.  SemEval-2017 Task 12: Clinical TempEval , 2017, *SEMEVAL.

[22]  James Pustejovsky,et al.  SemEval-2015 Task 6: Clinical TempEval , 2015, *SEMEVAL.