Semantic Role Labeling for Biomedical Corpus Using Maximum Entropy Classifier

Semantic role labeling (SRL) is a natural language processing (NLP) task that finds shallow semantic representations from sentences. In this paper, we construct a biomedical proposition bank and train a biomedical semantic role labeling system that can be used to facilitate relation extraction and information retrieval in biomedical domain. Firstly, we construct a proposition bank on the basis of the GENIA TreeBank following the Penn PropBank annotation. Secondly, we use GenPropBank to train a biomedical SRL system, which uses maximum entropy as a classifier. Our experimental results show that a newswire SRL system that achieves an F1 of 85.56 % in the newswire domain can only maintain an F1 of 65.43 % when ported to the biomedical domain. By using our annotated biomedical corpus, we can increase that F1 by 19.2 %.