A Syntactic Parser with Semantic Filtering for Biomedical Text

Because of the large volume of online literature in biomedicine, potentially useful information is often underutilized by researchers. Natural language processing techniques are increasingly used for enhanced access to this literature, often extracting specific information on genes, proteins, and other phenomena, including relationships such as activation and inhibition. Some extraction systems use pattern matching or underspecified syntactic methods to yield high recall, while others employ semantic processing aimed at a narrowly focused target to produce high precision. We are experimenting with preliminary syntactic parsing of biomedical text followed by semantic filtering to combine the strongest features of both approaches. The Genescene parser [1] identifies syntactic predications that have simple noun phrases as arguments and verbs or prepositions as predicates. A wide range of relations are identified; however, semantic labeling of the predications is not provided. SemGen [2] extracts semantic relationships focused on the etiology of genetic diseases and uses the Unified Medical Language System