Clinical Named Entity Recognition from Chinese Electronic Medical Records Based on Deep Learning Pretraining

Background Clinical named entity recognition is the basic task of mining electronic medical records text, which are with some challenges containing the language features of Chinese electronic medical records text with many compound entities, serious missing sentence components, and unclear entity boundary. Moreover, the corpus of Chinese electronic medical records is difficult to obtain. Methods Aiming at these characteristics of Chinese electronic medical records, this study proposed a Chinese clinical entity recognition model based on deep learning pretraining. The model used word embedding from domain corpus and fine-tuning of entity recognition model pretrained by relevant corpus. Then BiLSTM and Transformer are, respectively, used as feature extractors to identify four types of clinical entities including diseases, symptoms, drugs, and operations from the text of Chinese electronic medical records. Results 75.06% Macro-P, 76.40% Macro-R, and 75.72% Macro-F1 aiming at test dataset could be achieved. These experiments show that the Chinese clinical entity recognition model based on deep learning pretraining can effectively improve the recognition effect. Conclusions These experiments show that the proposed Chinese clinical entity recognition model based on deep learning pretraining can effectively improve the recognition performance.

[1]  Yu Zhang,et al.  Clinical Named Entity Recognition From Chinese Electronic Health Records via Machine Learning Methods , 2018, JMIR medical informatics.

[2]  Fei Li,et al.  Recognizing irregular entities in biomedical text via deep neural networks , 2017, Pattern Recognit. Lett..

[3]  Yi Qian,et al.  Joint segmentation and named entity recognition using dual decomposition in Chinese discharge summaries. , 2014, Journal of the American Medical Informatics Association : JAMIA.

[4]  Li Chen,et al.  Supervised methods for symptom name recognition in free-text clinical records of traditional Chinese medicine: An empirical study , 2014, J. Biomed. Informatics.

[5]  Lijun Qian,et al.  A multitask bi-directional RNN model for named entity recognition on Chinese electronic medical records , 2018, BMC Bioinformatics.

[6]  Hua Xu,et al.  Research and applications: A comprehensive study of named entity recognition in Chinese clinical text , 2014, J. Am. Medical Informatics Assoc..

[7]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[8]  Lars Juhl Jensen,et al.  A Guide to Dictionary-Based Text Mining. , 2019, Methods in molecular biology.

[9]  Ming Zhou,et al.  Recognizing Named Entities in Tweets , 2011, ACL.

[10]  Hua Xu,et al.  Named Entity Recognition in Chinese Clinical Text Using Deep Neural Network , 2015, MedInfo.

[11]  Ming Yang,et al.  Entity recognition from clinical texts via recurrent neural network , 2017, BMC Medical Informatics and Decision Making.

[12]  Xi Yang,et al.  Combine Factual Medical Knowledge and Distributed Word Representation to Improve Clinical Named Entity Recognition , 2018, AMIA.

[13]  Lei Liu,et al.  Extracting important information from Chinese Operation Notes with natural language processing methods , 2014, J. Biomed. Informatics.

[14]  Shuying Shen,et al.  2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text , 2011, J. Am. Medical Informatics Assoc..

[15]  Wei Xu,et al.  Bidirectional LSTM-CRF Models for Sequence Tagging , 2015, ArXiv.

[16]  Tao Chen,et al.  Disease named entity recognition by combining conditional random fields and bidirectional recurrent neural networks , 2016, Database J. Biol. Databases Curation.