DEEPER: A Full Parsing Based Approach to Protein Relation Extraction

Lexical variance in biomedical texts poses a challenge to automatic protein relation mining. We therefore propose a new approach that relies only on more general language structures such as parsing and dependency information for the construction of feature vectors that can be used by standard machine learning algorithms in deciding whether a sentence describes a protein interaction or not. As our approach is not dependent on the use of specific interaction keywords, it is applicable to heterogeneous corpora. Evaluation on benchmark datasets shows that our method is competitive with existing state-of-the-art algorithms for the extraction of protein interactions.

[1]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[2]  Razvan C. Bunescu,et al.  Subsequence Kernels for Relation Extraction , 2005, NIPS.

[3]  Ian Witten,et al.  Data Mining , 2000 .

[4]  Dragomir R. Radev,et al.  Extracting Interacting Protein Pairs and Evidence Sentences by using Dependency Parsing and Machine Learning Techniques , 2007 .

[5]  Finn Verner Jensen,et al.  Introduction to Bayesian Networks , 2008, Innovations in Bayesian Networks.

[6]  Pieter W. Adriaans,et al.  Learning Relations from Biomedical Corpora Using Dependency Trees , 2006, KDECB.

[7]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[8]  Toshihisa Takagi,et al.  Automated extraction of information on protein-protein interactions from the biological literature , 2001, Bioinform..

[9]  Nigel Collier,et al.  Extracting the Names of Genes and Gene Products with a Hidden Markov Model , 2000, COLING.

[10]  Ralf Zimmer,et al.  RelEx - Relation extraction using dependency parse trees , 2007, Bioinform..

[11]  Mark Stevenson,et al.  Comparing Information Extraction Pattern Models , 2006 .

[12]  Claudio Giuliano,et al.  Exploiting Shallow Linguistic Information for Relation Extraction from Biomedical Literature , 2006, EACL.

[13]  Rohit J. Kate,et al.  Comparative experiments on learning information extractors for proteins and their interactions , 2005, Artif. Intell. Medicine.

[14]  Dietrich Rebholz-Schuhmann,et al.  LLL'05 Challenge: Genic Interaction Extraction - Identication of Language Patterns Based on Alignment and Finite State Automata , 2005 .

[15]  KüffnerRobert,et al.  RelEx---Relation extraction using dependency parse trees , 2007 .

[16]  Claire Nédellec,et al.  Learning Language in Logic - Genic Interaction Extraction Challenge , 2005 .

[17]  Pieter W. Adriaans,et al.  Learning Relations from Biomedical Corpora Using Dependency Tree Levels , 2006 .

[18]  T. Takagi,et al.  Toward information extraction: identifying protein names from biological papers. , 1998, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[19]  Yakushiji Biomedical Information Extraction with Predicate-Argument Structure Patterns , 2005 .

[20]  Lorraine K. Tanabe,et al.  Tagging gene and protein names in biomedical text , 2002, Bioinform..

[21]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .