Exploring Deep Knowledge Resources in Biomedical Name Recognition

In this paper, we present a named entity recognition system in the biomedical domain. In order to deal with the special phenomena in the biomedical domain, various evidential features are proposed and integrated through a Hidden Markov Model (HMM). In addition, a Support Vector Machine (SVM) plus sigmoid is proposed to resolve the data sparseness problem in our system. Besides the widely used lexical-level features, such as word formation pattern, morphological pattern, out-domain POS and semantic trigger, we also explore the name alias phenomenon, the cascaded entity name phenomenon, the use of both a closed dictionary from the training corpus and an open dictionary from the database term list SwissProt and the alias list LocusLink, the abbreviation resolution and indomain POS using the GENIA corpus. 1. The Baseline System 1.1 Hidden Markov Model In this paper, we use the Hidden Markov Model (HMM) as described in Zhou et al (2002). Given an output sequence O , the system finds the most likely state sequence S that maximizes as follows: n n o o o ... 2 1 1 = ) n n s s s ... 2 1 1 =