Extracting Interacting Protein Pairs and Evidence Sentences by using Dependency Parsing and Machine Learning Techniques

The biomedical literature is growing rapidly. This increases the need for developing text mining techniques to automatically extract biologically important information such as protein-protein interactions from free texts. Besides identifying an interaction and the interacting pair of proteins, it is also important to extract from the full text the most relevant sentences describing that interaction. These issues were addressed in the BioCreAtIvE II (Critical Assessment for Information Extraction in Biology) challenge evaluation as sub-tasks under the protein-protein interaction extraction (PPI) task. We present our approach of using dependency parsing and machine learning techniques to identify interacting protein pairs from full text articles (Protein Interaction Pairs Sub-task 2 (IPS)) and extracting the most relevant sentences that describe their interaction (Protein Interaction Sentences Sub-task 3 (ISS)).

[1]  Adwait Ratnaparkhi,et al.  A Maximum Entropy Approach to Identifying Sentence Boundaries , 1997, ANLP.

[2]  Rolf Apweiler,et al.  The SWISS-PROT protein sequence data bank and its supplement TrEMBL , 1997, Nucleic Acids Res..

[3]  B. Schölkopf,et al.  Advances in kernel methods: support vector learning , 1999 .

[4]  Miguel A. Andrade-Navarro,et al.  Automatic Extraction of Biological Information from Scientific Text: Protein-Protein Interactions , 1999, ISMB.

[5]  Rolf Apweiler,et al.  The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000 , 2000, Nucleic Acids Res..

[6]  James Pustejovsky,et al.  Robust Relational Parsing Over Biomedical Literature: Extracting Inhibit Relations , 2001, Pacific Symposium on Biocomputing.

[7]  Joel D. Martin,et al.  PreBIND and Textomy – mining the biomedical literature for protein-protein interactions using a support vector machine , 2003, BMC Bioinformatics.

[8]  Masatoshi Yoshikawa,et al.  Extracting Information on Protein-Protein Interactions from Biological Literature Based on Machine Learning Approaches , 2003 .

[9]  Anton Yuryev,et al.  Extracting human protein interactions from MEDLINE using a full-sentence parser , 2004, Bioinform..

[10]  Rohit J. Kate,et al.  Comparative experiments on learning information extractors for proteins and their interactions , 2005, Artif. Intell. Medicine.

[11]  Masaki Murata,et al.  Extracting Protein-Protein Interaction Information from Biomedical Text with SVM , 2006, IEICE Trans. Inf. Syst..

[12]  Gabriele Ausiello,et al.  MINT: the Molecular INTeraction database , 2006, Nucleic Acids Res..