论文信息 - Comparison of named entity recognition methodologies in biomedical documents - 字舞流文

Comparison of named entity recognition methodologies in biomedical documents

BackgroundBiomedical named entity recognition (Bio-NER) is a fundamental task in handling biomedical text terms, such as RNA, protein, cell type, cell line, and DNA. Bio-NER is one of the most elementary and core tasks in biomedical knowledge discovery from texts. The system described here is developed by using the BioNLP/NLPBA 2004 shared task. Experiments are conducted on a training and evaluation set provided by the task organizers.ResultsOur results show that, compared with a baseline having a 70.09% F1 score, the RNN Jordan- and Elman-type algorithms have F1 scores of approximately 60.53% and 58.80%, respectively. When we use CRF as a machine learning algorithm, CCA, GloVe, and Word2Vec have F1 scores of 72.73%, 72.74%, and 72.82%, respectively.ConclusionsBy using the word embedding constructed through the unsupervised learning, the time and cost required to construct the learning data can be saved.

Hye-Jeong Song | Chan-Young Park | Jong-Dae Kim | Yu-Seop Kim | Byeong-Cheol Jo | Jong-Dae Kim | Chan-Young Park | Yu-Seop Kim | Hye-Jeong Song | Byeong-Cheol Jo

[1] Jun'ichi Tsujii,et al. Tuning support vector machines for biomedical named entity recognition , 2002, ACL Workshop on Natural Language Processing in the Biomedical Domain.

[2] L. F. Rau,et al. Extracting company names from text , 1991, [1991] Proceedings. The Seventh IEEE Conference on Artificial Intelligence Application.

[3] Qiuping Xu. Canonical correlation Analysis , 2014 .

[4] Shaojun Zhao,et al. Named Entity Recognition in Biomedical Texts using an HMM Model , 2004, NLPBA/BioNLP.

[5] Malvina Nissim,et al. Exploiting Context for Biomedical Entity Recognition: From Syntax to the Web , 2004, NLPBA/BioNLP.

[6] Alan R. Aronson,et al. Exploiting a Large Thesaurus for Information Retrieval , 1994, RIAO.

[7] Andrew McCallum,et al. Maximum Entropy Markov Models for Information Extraction and Segmentation , 2000, ICML.

[8] Michael Krauthammer,et al. Term identification in the biomedical literature , 2004, J. Biomed. Informatics.

[9] Cícero Nogueira dos Santos,et al. Entropy Guided Transformation Learning: Algorithms and Applications , 2012, SpringerBriefs in Computer Science.

[10] Jeffrey Dean,et al. Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[11] Tiejun Zhao,et al. Biomedical Named Entity Recognition Based on Classifiers Ensemble , 2008, Int. J. Comput. Sci. Appl..

[12] Satoshi Sekine,et al. Description of the Japanese NE System Used for MET-2 , 1998, MUC.

[13] H. Hotelling. Relations Between Two Sets of Variates , 1936 .

[14] Michael I. Jordan. Serial Order: A Parallel Distributed Processing Approach , 1997 .

[15] Erik F. Tjong Kim Sang,et al. Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.

[16] Gary Geunbae Lee,et al. POSBIOTM-NER in the Shared Task of BioNLP/NLPBA2004 , 2004, NLPBA/BioNLP.

[17] Hae-Chang Rim,et al. Two-Phase Biomedical NE Recognition based on SVMs , 2003, BioNLP@ACL.

[18] Miguel A. Andrade-Navarro,et al. Automatic Extraction of Biological Information from Scientific Text: Protein-Protein Interactions , 1999, ISMB.

[19] Shih-Hung Wu,et al. Integrating linguistic knowledge into a conditional random fieldframework to identify biomedical named entities , 2006, Expert systems with applications.

[20] K. Bretonnel Cohen,et al. Natural Language Processing and Systems Biology , 2004, Artificial Intelligence Methods And Tools For Systems Biology.

[21] Karl Stratos,et al. Model-based Word Embeddings from Decompositions of Count Matrices , 2015, ACL.

[22] Yoshua Bengio,et al. Investigation of recurrent-neural-network architectures and learning methods for spoken language understanding , 2013, INTERSPEECH.

[23] Andrew McCallum,et al. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[24] Penelope Sibun,et al. A Practical Part-of-Speech Tagger , 1992, ANLP.

[25] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[26] Graciela Gonzalez,et al. BANNER: An Executable Survey of Advances in Biomedical Named Entity Recognition , 2007, Pacific Symposium on Biocomputing.

[27] Askar Hamdulla,et al. Uyghur stemming using conditional random fields , 2015 .

[28] Hideki Isozaki,et al. Efficient Support Vector Classifiers for Named Entity Recognition , 2002, COLING.

[29] H. Knutsson,et al. Learning Corner Orientation Using Canonical Correlation , 2001 .

[30] José Luís Oliveira,et al. Biomedical Named Entity Recognition: A Survey of Machine-Learning Tools , 2012 .

[31] Hongfang Liu,et al. Research Paper: Quantitative Assessment of Dictionary-based Protein Named Entity Tagging , 2006, J. Am. Medical Informatics Assoc..

[32] Mark Craven,et al. Constructing Biological Knowledge Bases by Extracting Information from Text Sources , 1999, ISMB.

[33] Martin T. Hagan,et al. Neural network design , 1995 .

[34] Jian Su,et al. Exploring Deep Knowledge Resources in Biomedical Name Recognition , 2004, NLPBA/BioNLP.

[35] Nigel Collier,et al. Introduction to the Bio-entity Recognition Task at JNLPBA , 2004, NLPBA/BioNLP.

[36] Charles E. Heckler,et al. Applied Multivariate Statistical Analysis , 2005, Technometrics.

[37] Wei Li,et al. Early results for Named Entity Recognition with Conditional Random Fields, Feature Induction and Web-Enhanced Lexicons , 2003, CoNLL.

[38] Pabitra Mitra,et al. Feature selection techniques for maximum entropy based biomedical named entity recognition , 2009, J. Biomed. Informatics.

[39] Wen-Lian Hsu,et al. NERBio: using selected word conjunctions, term normalization, and global patterns to improve biomedical named entity recognition , 2006, BMC Bioinformatics.

[40] Sham M. Kakade,et al. Multi-view Regression Via Canonical Correlation Analysis , 2007, COLT.

[41] T. Takagi,et al. Toward information extraction: identifying protein names from biological papers. , 1998, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[42] Paolo Rosso,et al. Biomedical Named Entity Recognition: A Poor Knowledge HMM-Based Approach , 2007, NLDB.

[43] Sham M. Kakade,et al. An Information Theoretic Framework for Multi-view Learning , 2008, COLT.

[44] Ulrike Schmidt,et al. SuperMimic – Fitting peptide mimetics into protein structures , 2006, BMC Bioinformatics.

[45] Jeffrey L. Elman,et al. Finding Structure in Time , 1990, Cogn. Sci..

[46] Carole A. Goble,et al. An ontology for bioinformatics applications , 1999, Bioinform..

[47] Yong Yu,et al. Learning Word Representation Considering Proximity and Ambiguity , 2014, AAAI.

[48] Burr Settles,et al. Biomedical Named Entity Recognition using Conditional Random Fields and Rich Feature Sets , 2004, NLPBA/BioNLP.

[49] Jason Weston,et al. Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[50] Ralph Grishman,et al. A Maximum Entropy Approach to Named Entity Recognition , 1999 .