Named Entity Recognition Using BERT BiLSTM CRF for Chinese Electronic Health Records

As the generation and accumulation of massive electronic health records (EHR), how to effectively extract the valuable medical information from EHR has been a popular research topic. During the medical information extraction, named entity recognition (NER) is an essential natural language processing (NLP) task. This paper presents our efforts using neural network approaches for this task. Based on the Chinese EHR offered by CCKS 2019 and the Second Affiliated Hospital of Soochow University (SAHSU), several neural models for NER, including BiLSTM, have been compared, along with two pre-trained language models, word2vec and BERT. We have found that the BERT-BiLSTM-CRF model can achieve approximately 75% F1 score, which outperformed all other models during the tests.

[1]  Masanori Hattori,et al.  Character-Based LSTM-CRF with Radical-Level Features for Chinese Named Entity Recognition , 2016, NLPCC/ICCPOL.

[2]  Xu Sun,et al.  Cross-Domain and Semisupervised Named Entity Recognition in Chinese Social Media: A Unified Model , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[3]  Xing Xie,et al.  Neural Chinese Named Entity Recognition via CNN-LSTM-CRF and Joint Training with Word Segmentation , 2019, WWW.

[4]  Changning Huang,et al.  Chinese Word Segmentation and Named Entity Recognition: A Pragmatic Approach , 2005, CL.

[5]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[6]  Wanxiang Che,et al.  Pre-Training with Whole Word Masking for Chinese BERT , 2019, ArXiv.

[7]  Jiajun Zhang,et al.  Multichannel LSTM-CRF for Named Entity Recognition in Chinese Social Media , 2017, CCL.

[8]  Guillaume Lample,et al.  Neural Architectures for Named Entity Recognition , 2016, NAACL.

[9]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[10]  Guoxin Wang,et al.  CAN-NER: Convolutional Attention Network for Chinese Named Entity Recognition , 2019, NAACL.

[11]  Hongfei Lin,et al.  An attention‐based BiLSTM‐CRF approach to document‐level chemical named entity recognition , 2018, Bioinform..

[12]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.