Long short-term memory RNN for biomedical named entity recognition

BackgroundBiomedical named entity recognition(BNER) is a crucial initial step of information extraction in biomedical domain. The task is typically modeled as a sequence labeling problem. Various machine learning algorithms, such as Conditional Random Fields (CRFs), have been successfully used for this task. However, these state-of-the-art BNER systems largely depend on hand-crafted features.ResultsWe present a recurrent neural network (RNN) framework based on word embeddings and character representation. On top of the neural network architecture, we use a CRF layer to jointly decode labels for the whole sentence. In our approach, contextual information from both directions and long-range dependencies in the sequence, which is useful for this task, can be well modeled by bidirectional variation and long short-term memory (LSTM) unit, respectively. Although our models use word embeddings and character embeddings as the only features, the bidirectional LSTM-RNN (BLSTM-RNN) model achieves state-of-the-art performance — 86.55% F1 on BioCreative II gene mention (GM) corpus and 73.79% F1 on JNLPBA 2004 corpus.ConclusionsOur neural network architecture can be successfully used for BNER without any manual feature engineering. Experimental results show that domain-specific pre-trained word embeddings and character-level representation can improve the performance of the LSTM-RNN models. On the GM corpus, we achieve comparable performance compared with other systems using complex hand-crafted features. Considering the JNLPBA corpus, our model achieves the best results, outperforming the previously top performing systems. The source code of our method is freely available under GPL at https://github.com/lvchen1989/BNER.

[1]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[2]  Xiaolong Wang,et al.  Evaluating Word Representation Features in Biomedical Named Entity Recognition Tasks , 2014, BioMed research international.

[3]  Rie Kubota Ando,et al.  BioCreative II Gene Mention Tagging System at IBM Watson , 2007 .

[4]  Zhenchao Jiang,et al.  Biomedical named entity recognition based on extended Recurrent Neural Networks , 2015, 2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[5]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[6]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[7]  Yue Zhang,et al.  Gated Neural Networks for Targeted Sentiment Analysis , 2016, AAAI.

[8]  Zhenchao Jiang,et al.  Training word embeddings for deep learning in biomedical text mining tasks , 2015, 2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[9]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[10]  Navdeep Jaitly,et al.  Towards End-To-End Speech Recognition with Recurrent Neural Networks , 2014, ICML.

[11]  Malvina Nissim,et al.  Exploiting Context for Biomedical Entity Recognition: From Syntax to the Web , 2004, NLPBA/BioNLP.

[12]  Dong-Hong Ji,et al.  Deep Learning for Textual Entailment Recognition , 2015, 2015 IEEE 27th International Conference on Tools with Artificial Intelligence (ICTAI).

[13]  Wei Xu,et al.  Bidirectional LSTM-CRF Models for Sequence Tagging , 2015, ArXiv.

[14]  Jieping Ye,et al.  Deep convolutional neural networks for annotating gene expression patterns in the mouse brain , 2015, BMC Bioinformatics.

[15]  Chun-Nan Hsu,et al.  Integrating high dimensional bi-directional parsing models for gene mention tagging , 2008, ISMB.

[16]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[17]  Hongfang Liu,et al.  BioThesaurus: a web-based thesaurus of protein and gene names , 2006, Bioinform..

[18]  Ruifeng Liu,et al.  Data-driven identification of structural alerts for mitigating the risk of drug-induced human liver injuries , 2015, Journal of Cheminformatics.

[19]  Hermann Ney,et al.  LSTM Neural Networks for Language Modeling , 2012, INTERSPEECH.

[20]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[21]  Xiaohui Liang,et al.  CHEMDNER system with mixed conditional random fields and multi-scale word clustering , 2015, Journal of Cheminformatics.

[22]  Yue Zhang,et al.  LibN3L: A Lightweight Package for Neural NLP , 2016, LREC.

[23]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[24]  Nigel Collier,et al.  Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications , 2004 .

[25]  Matthew Crosby,et al.  Association for the Advancement of Artificial Intelligence , 2014 .

[26]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[27]  Stephen C. Ekker,et al.  Mojo Hand, a TALEN design tool for genome editing applications , 2013, BMC Bioinformatics.

[28]  Marco Cammisa,et al.  Identification and analysis of conserved pockets on protein surfaces , 2013, BMC Bioinformatics.

[29]  Shaojun Zhao,et al.  Named Entity Recognition in Biomedical Texts using an HMM Model , 2004, NLPBA/BioNLP.

[30]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[31]  Graciela Gonzalez,et al.  BANNER: An Executable Survey of Advances in Biomedical Named Entity Recognition , 2007, Pacific Symposium on Biocomputing.

[32]  Jian Su,et al.  Exploring Deep Knowledge Resources in Biomedical Name Recognition , 2004, NLPBA/BioNLP.

[33]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[34]  Xinlei Chen,et al.  Visualizing and Understanding Neural Models in NLP , 2015, NAACL.

[35]  Nigel Collier,et al.  Introduction to the Bio-entity Recognition Task at JNLPBA , 2004, NLPBA/BioNLP.

[36]  Richard Tzong-Han Tsai,et al.  UvA-DARE ( Digital Academic Repository ) Overview of BioCreative II gene mention recognition , 2008 .

[37]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[38]  Yoshua Bengio,et al.  Word Representations: A Simple and General Method for Semi-Supervised Learning , 2010, ACL.

[39]  David Sankoff,et al.  Structural vs. functional mechanisms of duplicate gene loss following whole genome doubling , 2015, BMC Bioinformatics.

[40]  Eric Nichols,et al.  Named Entity Recognition with Bidirectional LSTM-CNNs , 2015, TACL.

[41]  Claire Cardie,et al.  Opinion Mining with Deep Recurrent Neural Networks , 2014, EMNLP.

[42]  Wen-Lian Hsu,et al.  NERBio: using selected word conjunctions, term normalization, and global patterns to improve biomedical named entity recognition , 2006, BMC Bioinformatics.

[43]  José Luís Oliveira,et al.  Gimli: open source and high-performance biomedical name recognition , 2013, BMC Bioinformatics.