Biomedical named entity recognition system

We propose a machine learning approach, using a Maximum Entropy (ME) model to construct a Named Entity Recognition (NER) classifier to retrieve biomedical names from texts. In experiments, we utilize a blend of various linguistic features incorporated into the ME model to assign class labels and location within an entity sequence, and a postprocessing strategy for corrections to sequences of tags to produce a state of the art solution. The experimental results on the GENIA corpus achieved an F-score of 68.2% for semantic classification of 23 categories and achieved F-score of 78.1% on identification.

[1]  Jian Su,et al.  Effective Adaptation of Hidden Markov Model-based Named Entity Recognizer for Biomedical Domain , 2003, BioNLP@ACL.

[2]  Beth M. Sundheim Overview of results of the MUC-6 evaluation , 1995, MUC.

[3]  Shih-Hung Wu,et al.  Exploitation of linguistic features using a CRF-based biomedical named entity recognizer , 2005, ACL 2005.

[4]  Burr Settles,et al.  Biomedical Named Entity Recognition using Conditional Random Fields and Rich Feature Sets , 2004, NLPBA/BioNLP.

[5]  Hae-Chang Rim,et al.  Two-Phase Biomedical NE Recognition based on SVMs , 2003, BioNLP@ACL.

[6]  Wen-Lian Hsu,et al.  A Maximum Entropy Approach to Biomedical Named Entity Recognition , 2004, BIOKDD.

[7]  Zhou GuoDong,et al.  Recognizing names in biomedical texts using hidden Markov model and SVM plus sigmoid , 2004 .

[8]  Hongfang Liu,et al.  Pacific Symposium on Biocomputing 9:238-249(2004) BIOLOGICAL NOMENCLATURES: A SOURCE OF LEXICAL KNOWLEDGE AND AMBIGUITY , 2022 .

[9]  Jian Su,et al.  Named Entity Recognition using an HMM-based Chunk Tagger , 2002, ACL.

[10]  Nigel Collier,et al.  Automatic Term Identification and Classification in Biology Texts. , 1999 .

[11]  Alexander A. Morgan,et al.  Rutabaga by any other name: extracting biological names , 2002, J. Biomed. Informatics.

[12]  Nigel Collier,et al.  Bio-Medical Entity Extraction using Support Vector Machines , 2005, Artif. Intell. Medicine.

[13]  T. Takagi,et al.  Toward information extraction: identifying protein names from biological papers. , 1998, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[14]  R. Gaizauskas,et al.  Term Recognition and Classification in Biological Science Journal Articles , 1998 .

[15]  Sophia Ananiadou,et al.  Developing a Robust Part-of-Speech Tagger for Biomedical Text , 2005, Panhellenic Conference on Informatics.

[16]  Nigel Collier,et al.  Extracting the Names of Genes and Gene Products with a Hidden Markov Model , 2000, COLING.

[17]  Jun'ichi Tsujii,et al.  GENIA corpus - a semantically annotated corpus for bio-textmining , 2003, ISMB.

[18]  Nigel Collier,et al.  Comparison of character-level and part of speech features for name recognition in biomedical texts , 2004, J. Biomed. Informatics.

[19]  C. Friedman,et al.  Using BLAST for identifying gene and protein names in journal articles. , 2000, Gene.

[20]  Jun'ichi Tsujii,et al.  Tuning support vector machines for biomedical named entity recognition , 2002, ACL Workshop on Natural Language Processing in the Biomedical Domain.

[21]  Guodong Zhou,et al.  Recognizing Names in Biomedical Texts using Hidden Markov Model and SVM plus Sigmoid , 2004, NLPBA/BioNLP.