PIE: an online prediction system for protein–protein interactions from text

Protein–protein interaction (PPI) extraction has been an important research topic in bio-text mining area, since the PPI information is critical for understanding biological processes. However, there are very few open systems available on the Web and most of the systems focus on keyword searching based on predefined PPIs. PIE (Protein Interaction information Extraction system) is a configurable Web service to extract PPIs from literature, including user-provided papers as well as PubMed articles. After providing abstracts or papers, the prediction results are displayed in an easily readable form with essential, yet compact features. The PIE interface supports more features such as PDF file extraction, PubMed search tool and network communication, which are useful for biologists and bio-system developers. The PIE system utilizes natural language processing techniques and machine learning methodologies to predict PPI sentences, which results in high precision performance for Web users. PIE is freely available at http://bi.snu.ac.kr/pie/.

[1]  Byoung-Tak Zhang,et al.  Text filtering by boosting naive Bayes classifiers , 2000, SIGIR '00.

[2]  A. Valencia,et al.  Text-mining and information-retrieval services for molecular biology , 2005, Genome Biology.

[3]  Salvatore J. Stolfo,et al.  AdaCost: Misclassification Cost-Sensitive Boosting , 1999, ICML.

[4]  Ulf Leser,et al.  Optimizing syntax patterns for discovering protein-protein interactions , 2005, SAC '05.

[5]  P. Bork,et al.  Literature mining for the biologist: from information retrieval to biological discovery , 2006, Nature Reviews Genetics.

[6]  Jian Su,et al.  Protein-Protein Interaction Extraction: A Supervised Learning Approach} , 2005 .

[7]  Hao Chen,et al.  Content-rich biological network constructed by mining PubMed abstracts , 2004, BMC Bioinformatics.

[8]  Alfonso Valencia,et al.  CARGO: a web portal to integrate customized biological information , 2007, Nucleic Acids Res..

[9]  Sun Kim,et al.  Identifying Protein-Protein Interaction Sentences Using Boosting and Kernel Methods , 2007 .

[10]  Jun'ichi Tsujii,et al.  GENIA corpus - a semantically annotated corpus for bio-textmining , 2003, ISMB.

[11]  A. Valencia,et al.  A gene network for navigating the literature , 2004, Nature Genetics.

[12]  William R. Hersh,et al.  A Survey of Current Work in Biomedical Text Mining , 2005 .

[13]  Michael Collins,et al.  Head-Driven Statistical Models for Natural Language Parsing , 2003, CL.

[14]  Dietrich Rebholz-Schuhmann,et al.  Collecting a Large Corpus from all of Medline , 2006, SMBM.

[15]  Kyu-Chul Lee,et al.  Finding the evidence for protein-protein interactions from PubMed abstracts , 2006, ISMB.

[16]  Michael Collins,et al.  Convolution Kernels for Natural Language , 2001, NIPS.

[17]  Massimo Poesio,et al.  Negation of protein-protein interactions: analysis and extraction , 2007, ISMB/ECCB.

[18]  Eric Brill,et al.  A Simple Rule-Based Part of Speech Tagger , 1992, HLT.