论文信息 - Exploring Deep Knowledge Resources in Biomedical Name Recognition

Exploring Deep Knowledge Resources in Biomedical Name Recognition

In this paper, we present a named entity recognition system in the biomedical domain. In order to deal with the special phenomena in the biomedical domain, various evidential features are proposed and integrated through a Hidden Markov Model (HMM). In addition, a Support Vector Machine (SVM) plus sigmoid is proposed to resolve the data sparseness problem in our system. Besides the widely used lexical-level features, such as word formation pattern, morphological pattern, out-domain POS and semantic trigger, we also explore the name alias phenomenon, the cascaded entity name phenomenon, the use of both a closed dictionary from the training corpus and an open dictionary from the database term list SwissProt and the alias list LocusLink, the abbreviation resolution and indomain POS using the GENIA corpus. 1. The Baseline System 1.1 Hidden Markov Model In this paper, we use the Hidden Markov Model (HMM) as described in Zhou et al (2002). Given an output sequence O , the system finds the most likely state sequence S that maximizes as follows: n n o o o ... 2 1 1 = ) n n s s s ... 2 1 1 =

Jian Su | Guodong Zhou | Guodong Zhou | Jian Su

[1] Jian Su,et al. Effective Adaptation of Hidden Markov Model-based Named Entity Recognizer for Biomedical Domain , 2003, BioNLP@ACL.

[2] Marti A. Hearst,et al. A Simple Algorithm for Identifying Abbreviation Definitions in Biomedical Text , 2002, Pacific Symposium on Biocomputing.

[3] Jin-Dong Kim,et al. The GENIA corpus: an annotated research abstract corpus in molecular biology domain , 2002 .

[4] Vladimir N. Vapnik,et al. The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[5] F ChenStanley,et al. An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[6] Andrew J. Viterbi,et al. Error bounds for convolutional codes and an asymptotically optimum decoding algorithm , 1967, IEEE Trans. Inf. Theory.

[7] John Platt,et al. Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[8] Jian Su,et al. Named Entity Recognition using an HMM-based Chunk Tagger , 2002, ACL.