论文信息 - A Bidirectional LSTM and Conditional Random Fields Approach to Medical Named Entity Recognition - 字舞流文

A Bidirectional LSTM and Conditional Random Fields Approach to Medical Named Entity Recognition

Medical named entity recognition is a fundamental and essential research for medical natural language possessing, aiming to identifying medical concepts or terminology such as diseases, drugs, treatments, procedures, etc. from unstructured medical text. A model based on a bidirectional LSTM and conditional random fields (Bi-LSTM-CRF) is proposed for medical named entity recognition. Our model contains three layers and relies on character-based word representations learned from the supervised corpus. BiLSTM-CRF model can learn the information characteristics of a given dataset. Experiments on a publically available NCBI Disease Corpus as an evaluation standard dataset shows our approach achieves a 0.8022 F1 measure, which outperforms a number of widely used baseline methods.

Kai Xu | Tianyong Hao | Wenyin Liu | Zhanfan Zhou | Tianyong Hao | Wenyin Liu | Kai Xu | Zhanfan Zhou

[1] Hua Xu,et al. Research and applications: A comprehensive study of named entity recognition in Chinese clinical text , 2014, J. Am. Medical Informatics Assoc..

[2] Zhiyong Lu,et al. NCBI disease corpus: A resource for disease name recognition and concept normalization , 2014, J. Biomed. Informatics.

[3] Jürgen Schmidhuber,et al. Learning to Forget: Continual Prediction with LSTM , 2000, Neural Computation.

[4] José Luís Oliveira,et al. A modular framework for biomedical concept recognition , 2013, BMC Bioinformatics.

[5] Hua Xu,et al. A study of active learning methods for named entity recognition in clinical text , 2015, J. Biomed. Informatics.

[6] Zhiyong Lu,et al. TaggerOne: joint named entity recognition and normalization with semi-Markov Models , 2016, Bioinform..

[7] Jeffrey Dean,et al. Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[8] Haibin Liu,et al. Natural language processing pipelines to annotate BioC collections with an application to the NCBI disease corpus , 2014, Database J. Biol. Databases Curation.

[9] Ramón Fernández Astudillo,et al. Not All Contexts Are Created Equal: Better Word Representations with Variable Attention , 2015, EMNLP.

[10] Zhiyong Lu,et al. DNorm: disease name normalization with pairwise learning to rank , 2013, Bioinform..

[11] W. John Wilbur,et al. Meshable: searching PubMed abstracts by utilizing MeSH and MeSH-derived topical terms , 2016, Bioinform..

[12] Guillaume Lample,et al. Neural Architectures for Named Entity Recognition , 2016, NAACL.

[13] Pascal Vincent,et al. Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14] Tao Chen,et al. Disease named entity recognition by combining conditional random fields and bidirectional recurrent neural networks , 2016, Database J. Biol. Databases Curation.

[15] Zhiyong Lu,et al. An improved corpus of disease mentions in PubMed citations , 2012, BioNLP@HLT-NAACL.

[16] Alexander A. Morgan,et al. Evaluation of text data mining for database curation: lessons learned from the KDD Challenge Cup , 2003, ISMB.

[17] Jürgen Schmidhuber,et al. Framewise phoneme classification with bidirectional LSTM and other neural network architectures , 2005, Neural Networks.

[18] Lei Liu,et al. Extracting important information from Chinese Operation Notes with natural language processing methods , 2014, J. Biomed. Informatics.

[19] Saad Alanazi,et al. A Named Entity Recognition System Applied to Arabic Text in the Medical Domain , 2017 .

[20] Devanshu Jain,et al. Supervised Named Entity Recognition for Clinical Data , 2015, CLEF.

[21] Wang Ling,et al. Finding Function in Form: Compositional Character Models for Open Vocabulary Word Representation , 2015, EMNLP.

[22] Graciela Gonzalez,et al. BANNER: An Executable Survey of Advances in Biomedical Named Entity Recognition , 2007, Pacific Symposium on Biocomputing.

[23] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[24] Benjamin M. Good,et al. Microtask Crowdsourcing for Disease Mention Annotation in PubMed Abstracts , 2014, Pacific Symposium on Biocomputing.

[25] Wei Xu,et al. Bidirectional LSTM-CRF Models for Sequence Tagging , 2015, ArXiv.

[26] Min Song,et al. PKDE4J: Entity and relation extraction for public knowledge discovery , 2015, J. Biomed. Informatics.