论文信息 - An Evaluation of the Effect of Automatic Preprocessing on Syntactic Parsing for Biomedical Relation Extraction

An Evaluation of the Effect of Automatic Preprocessing on Syntactic Parsing for Biomedical Relation Extraction

Relation extraction (RE) is an important text mining task which is the basis for further complex and advanced tasks. In state-of-the-art RE approaches, syntactic information obtained through parsing plays a crucial role. In the context of biomedical RE previous studies report usage of various automatic preprocessing techniques applied before parsing the input text. However, these studies do not specify to what extent such techniques improve RE results and to what extent they are corpus specific as well as parser specific. In this paper, we aim at addressing these issues by using various preprocessing techniques, two syntactic tree kernel based RE approaches and two different parsers on 5 widely used benchmark biomedical corpora of the protein-protein interaction (PPI) extraction task. We also provide analyses of various corpus characteristics to verify whether there are correlations between these characteristics and the RE results obtained. These analyses of corpus characteristics can be exploited to compare the 5 PPI corpora.

Alberto Lavelli | Md. Faisal Mahbub Chowdhury | A. Lavelli

[1] Jari Björne,et al. Comparative analysis of five protein-protein interaction corpora , 2008, BMC Bioinformatics.

[2] Sampo Pyysalo,et al. A Comparative Study of Syntactic Parsers for Event Extraction , 2010, BioNLP@ACL.

[3] K. Bretonnel Cohen,et al. Frontiers of biomedical text mining: current progress , 2007, Briefings Bioinform..

[4] Eugene Charniak,et al. Coarse-to-Fine n-Best Parsing and MaxEnt Discriminative Reranking , 2005, ACL.

[5] Alessandro Moschitti,et al. A Study on Dependency Tree Kernels for Automatic Extraction of Protein-Protein Interaction , 2011, BioNLP@ACL.

[6] Daniel Berleant,et al. Mining MEDLINE: Abstracts, Sentences, or Phrases? , 2001, Pacific Symposium on Biocomputing.

[7] Jun'ichi Tsujii,et al. Protein-protein interaction extraction by leveraging multiple kernels and parsers , 2009, Int. J. Medical Informatics.

[8] Jari Björne,et al. BioInfer: a corpus for information extraction in the biomedical domain , 2007, BMC Bioinformatics.

[9] Eugene Charniak,et al. Any Domain Parsing: Automatic Domain Adaptation for Natural Language Parsing , 2010 .

[10] Jun'ichi Tsujii,et al. Evaluating contributions of natural language parsers to protein–protein interaction extraction , 2008, Bioinform..

[11] Rohit J. Kate,et al. Comparative experiments on learning information extractors for proteins and their interactions , 2005, Artif. Intell. Medicine.