Entity recognition in Chinese clinical text using attention-based CNN-LSTM-CRF

BackgroundClinical entity recognition as a fundamental task of clinical text processing has been attracted a great deal of attention during the last decade. However, most studies focus on clinical text in English rather than other languages. Recently, a few researchers have began to study entity recognition in Chinese clinical text.MethodsIn this paper, a novel deep neural network, called attention-based CNN-LSTM-CRF, is proposed to recognize entities in Chinese clinical text. Attention-based CNN-LSTM-CRF is an extension of LSTM-CRF by introducing a CNN (convolutional neural network) layer after the input layer to capture local context information of words of interest and an attention layer before the CRF layer to select relevant words in the same sentence.ResultsIn order to evaluate the proposed method, we compare it with other two currently popular methods, CRF (conditional random field) and LSTM-CRF, on two benchmark datasets. One of the datasets is publically available and only contains contiguous clinical entities, and the other one is constructed by us and contains contiguous and discontiguous clinical entities. Experimental results show that attention-based CNN-LSTM-CRF outperforms CRF and LSTM-CRF.ConclusionsCNN and attention mechanism are individually beneficial to LSTM-CRF-based Chinese clinical entity recognition system, no matter whether contiguous clinical entities are considered. The conribution of attention mechanism is greater than CNN.

[1]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[2]  Özlem Uzuner,et al.  Extracting medication information from clinical text , 2010, J. Am. Medical Informatics Assoc..

[3]  Shuying Shen,et al.  2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text , 2011, J. Am. Medical Informatics Assoc..

[4]  Sanna Salanterä,et al.  Overview of the ShARe/CLEF eHealth Evaluation Lab 2013 , 2013, CLEF.

[5]  James Pustejovsky,et al.  SemEval-2013 Task 1: TempEval-3: Evaluating Time Expressions, Events, and Temporal Relations , 2013, *SEMEVAL.

[6]  Hua Xu,et al.  Recognizing and Encoding Discorder Concepts in Clinical Text using Machine Learning and Vector Space Model , 2013, CLEF.

[7]  Anna Rumshisky,et al.  Evaluating temporal relations in clinical text: 2012 i2b2 Challenge , 2013, J. Am. Medical Informatics Assoc..

[8]  Overview of the ShARe/CLEF eHealth Evaluation Lab 2014 , 2014, CLEF.

[9]  Suresh Manandhar,et al.  SemEval-2014 Task 7: Analysis of Clinical Text , 2014, *SEMEVAL.

[10]  Hua Xu,et al.  Research and applications: A comprehensive study of named entity recognition in Chinese clinical text , 2014, J. Am. Medical Informatics Assoc..

[11]  James Pustejovsky,et al.  SemEval-2015 Task 6: Clinical TempEval , 2015, *SEMEVAL.

[12]  Özlem Uzuner,et al.  Automated systems for the de-identification of longitudinal clinical narratives: Overview of 2014 i2b2/UTHealth shared task Track 1 , 2015, J. Biomed. Informatics.

[13]  Xiaolong Wang,et al.  Recognizing Disjoint Clinical Concepts in Clinical Text Using Machine Learning-based Methods , 2015, AMIA.

[14]  Wei Xu,et al.  Bidirectional LSTM-CRF Models for Sequence Tagging , 2015, ArXiv.

[15]  James Pustejovsky,et al.  SemEval-2016 Task 12: Clinical TempEval , 2016, NAACL 2016.

[16]  Qingcai Chen,et al.  HITSZ _ CNER : A hybrid system for entity recognition from Chinese clinical text , 2017 .

[17]  Jianfeng Du,et al.  Knowledge Graph and Semantic Computing. Language, Knowledge, and Intelligence , 2017, Communications in Computer and Information Science.

[18]  Ming Yang,et al.  Entity recognition from clinical texts via recurrent neural network , 2017, BMC Medical Informatics and Decision Making.

[19]  Dong-Hong Ji,et al.  Disorder recognition in clinical texts using multi-label structured SVM , 2017, BMC Bioinformatics.

[20]  Xiaolong Wang,et al.  Chinese Clinical Entity Recognition via Attention-Based CNN-LSTM-CRF , 2018, 2018 IEEE International Conference on Healthcare Informatics Workshop (ICHI-W).

[21]  Hongfei Lin,et al.  An attention‐based BiLSTM‐CRF approach to document‐level chemical named entity recognition , 2018, Bioinform..