Research on Chinese medical named entity recognition based on collaborative cooperation of multiple neural network models

Medical named entity recognition (NER) in Chinese electronic medical records (CEMRs) has drawn much research attention, and plays a vital prerequisite role for extracting high-value medical information. In 2018, China Health Information Processing Conference (CHIP2018) organized a medical NER academic competition aiming to extract three types of malignant tumor entity from CEMRs. Since the three types of entity are highly domain-specific and interdependency, extraction of them cannot be achieved with a single neural network model. Based on comprehensive study of the three types of entity and the entity interdependencies, we propose a collaborative cooperation of multiple neural network models based approach, which consists of two BiLSTM-CRF models and a CNN model. In order to tackle the problem that target scene dataset is small and entity distributions are sparse, we introduce non-target scene datasets and propose sentence-level neural network model transfer learning. Based on 30,000 real-world CEMRs, we pre-train medical domain-specific Chinese character embeddings with word2vec, GloVe and ELMo, and apply them to our approach respectively to validate effects of pre-trained language models in Chinese medical NER. Also, as control experiments, we apply Gated Recurrent Unit to our approach. Finally, our approach achieves an overall F1-score of 87.60%, which is the state-of-the-art performance to the best of our knowledge. In addition, our approach has won the champion of the medical NER academic competition organized by 2019 China Conference on Knowledge Graph and Semantic Computing, which proves the outstanding generalization ability of our approach.

[1]  Wenjie Li,et al.  Component-Enhanced Chinese Character Embeddings , 2015, EMNLP.

[2]  Wei Xu,et al.  Bidirectional LSTM-CRF Models for Sequence Tagging , 2015, ArXiv.

[3]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[4]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[5]  Guillaume Lample,et al.  Neural Architectures for Named Entity Recognition , 2016, NAACL.

[6]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[7]  Xu Wang,et al.  A comparative study for biomedical named entity recognition , 2015, International Journal of Machine Learning and Cybernetics.

[8]  Lishuang Li,et al.  Recognizing Biomedical Named Entities Based on the Sentence Vector/Twin Word Embeddings Conditioned Bidirectional LSTM , 2016, CCL.

[9]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[10]  Jun Yan,et al.  Entity recognition in Chinese clinical text using attention-based CNN-LSTM-CRF , 2019, BMC Medical Informatics and Decision Making.

[11]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[12]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.

[13]  Maryam Habibi,et al.  Deep learning with word embeddings improves biomedical named entity recognition , 2017, Bioinform..

[14]  Hongfei Lin,et al.  An attention‐based BiLSTM‐CRF approach to document‐level chemical named entity recognition , 2018, Bioinform..

[15]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2002 Shared Task: Language-Independent Named Entity Recognition , 2002, CoNLL.

[16]  Gina-Anne Levow,et al.  The Third International Chinese Language Processing Bakeoff: Word Segmentation and Named Entity Recognition , 2006, SIGHAN@COLING/ACL.

[17]  Masanori Hattori,et al.  Character-Based LSTM-CRF with Radical-Level Features for Chinese Named Entity Recognition , 2016, NLPCC/ICCPOL.

[18]  Steven Bethard,et al.  A Survey on Recent Advances in Named Entity Recognition from Deep Learning models , 2018, COLING.

[19]  Rui Liu,et al.  A hybrid approach for named entity recognition in Chinese electronic medical record , 2019, BMC Medical Informatics and Decision Making.