MP-PIPE: a massively parallel protein-protein interaction prediction engine

Interactions among proteins are essential to many biological functions in living cells but experimentally detected interactions represent only a small fraction of the real interaction network. Computational protein interaction prediction methods have become important to augment the experimental methods; in particular sequence based prediction methods that do not require additional data such as homologous sequences or 3D structure information which are often not available. Our Protein Interaction Prediction Engine (PIPE) method falls into this category. Park has recently compared PIPE with the other competing methods and concluded that our method "significantly outperforms the others in terms of recall-precision across both the yeast and human data". Here, we present MP-PIPE, a new massively parallel PIPE implementation for large scale, high throughput protein interaction prediction. MP-PIPE enabled us to perform the first ever complete scan of the entire human protein interaction network; a massively parallel computational experiment which took three months of full time 24/7 computation on a dedicated SUN UltraSparc T2+ based cluster with 50 nodes, 800 processor cores and 6,400 hardware supported threads. The implications for the understanding of human cell function will be significant as biologists are starting to analyze the 130,470 new protein interactions and possible new pathways in Human cells predicted by MP-PIPE.

[1]  D. Eisenberg,et al.  Detecting protein function and protein-protein interactions from genome sequences. , 1999, Science.

[2]  M. Vidal,et al.  Effect of sampling on topology predictions of protein-protein interaction networks , 2005, Nature Biotechnology.

[3]  Yungki Park,et al.  Critical assessment of sequence-based protein-protein interaction prediction methods that do not require homologous protein sequences , 2009, BMC Bioinformatics.

[4]  Anton J. Enright,et al.  Protein interaction maps for complete genomes based on gene fusion events , 1999, Nature.

[5]  M. Mann,et al.  Proteomics to study genes and genomes , 2000, Nature.

[6]  Brian Raught,et al.  Advances in protein complex analysis using mass spectrometry , 2005, The Journal of physiology.

[7]  R. Becklin,et al.  An integrated strategy for the discovery of drug targets by the analysis of protein–protein interactions , 2004 .

[8]  Oliviero Carugo,et al.  Computational approaches to protein-protein interaction , 2004, Journal of Structural and Functional Genomics.

[9]  Jean-Philippe Vert,et al.  A tree kernel to analyse phylogenetic profiles , 2002, ISMB.

[10]  S. Fields,et al.  A novel genetic system to detect protein–protein interactions , 1989, Nature.

[11]  Baldomero Oliva,et al.  Prediction of protein-protein interactions using distant conservation of sequence patterns and structure relationships , 2005, Bioinform..

[12]  H. Herzel,et al.  Is there a bias in proteome research? , 2001, Genome research.

[13]  Juwen Shen,et al.  Predicting protein–protein interactions based only on sequences information , 2007, Proceedings of the National Academy of Sciences.

[14]  Ashkan Golshani,et al.  Computational methods for predicting protein-protein interactions. , 2008, Advances in biochemical engineering/biotechnology.

[15]  Ashkan Golshani,et al.  Large-Scale Protein-Protein Interaction Detection Approaches: Past, Present and Future , 2008 .

[16]  D. Eisenberg,et al.  A combined algorithm for genome-wide prediction of protein function , 1999, Nature.

[17]  Darby Tien-Hao Chang,et al.  Predicting protein-protein interactions in unbalanced data using the primary structure of proteins , 2010, BMC Bioinformatics.

[18]  Dong-Soo Han,et al.  PreSPI: a domain combination based prediction system for protein-protein interaction. , 2004, Nucleic acids research.

[19]  A. Mendelsohn,et al.  Protein Interaction Methods-Toward an Endgame , 1999, Science.

[20]  E. Sprinzak,et al.  Correlated sequence-signatures as markers of protein-protein interaction. , 2001, Journal of molecular biology.

[21]  J. R. Green,et al.  Global investigation of protein–protein interactions in yeast Saccharomyces cerevisiae using re-occurring short polypeptide sequences , 2008, Nucleic acids research.

[22]  M. O. Dayhoff,et al.  22 A Model of Evolutionary Change in Proteins , 1978 .

[23]  M. Morris,et al.  The Design , 1998 .

[24]  Nagiza F. Samatova,et al.  Efficient data access for parallel BLAST , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.

[25]  Gary D Bader,et al.  Computational Prediction of Protein–Protein Interactions , 2008, Molecular biotechnology.

[26]  Emmanuel D Levy,et al.  Evolution and dynamics of protein interactions and networks. , 2008, Current opinion in structural biology.

[27]  Yanzhi Guo,et al.  Using support vector machine combined with auto covariance to predict protein–protein interactions from protein sequences , 2008, Nucleic acids research.

[28]  David A. Gough,et al.  Whole-proteome interaction mining , 2003, Bioinform..

[29]  Albert Chan,et al.  PIPE: a protein-protein interaction prediction engine based on the re-occurring short polypeptide sequences between known interacting protein pairs , 2006, BMC Bioinformatics.

[30]  James R. Knight,et al.  A comprehensive analysis of protein–protein interactions in Saccharomyces cerevisiae , 2000, Nature.

[31]  Jean-Loup Faulon,et al.  Predicting protein-protein interactions using signature products , 2005, Bioinform..

[32]  R. Sharan,et al.  Network-based prediction of protein function , 2007, Molecular systems biology.

[33]  Wan Kyu Kim,et al.  Large scale statistical prediction of protein-protein interaction by potentially interacting domain (PID) pair. , 2002, Genome informatics. International Conference on Genome Informatics.

[34]  Joel S. Bader,et al.  Precision and recall estimates for two-hybrid screens , 2008, Bioinform..

[35]  T. Lane,et al.  Exploiting Amino Acid Composition for Predicting Protein-Protein Interactions , 2009, PloS one.

[36]  R. Overbeek,et al.  The use of gene clusters to infer functional coupling. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[37]  Vijay S. Pande,et al.  Folding@home: Lessons from eight years of volunteer distributed computing , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[38]  Afonso Ferreira,et al.  Efficient Parallel Graph Algorithms for Coarse-Grained Multicomputers and BSP , 2002, Algorithmica.

[39]  R. Ozawa,et al.  A comprehensive two-hybrid analysis to explore the yeast protein interactome , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[40]  Michelle R. Arkin,et al.  Small-molecule inhibitors of protein–protein interactions: progressing towards the dream , 2004, Nature Reviews Drug Discovery.

[41]  Robert B. Russell,et al.  3did: interacting protein domains of known three-dimensional structure , 2004, Nucleic Acids Res..

[42]  Robert B. Russell,et al.  InterPreTS: protein Interaction Prediction through Tertiary Structure , 2003, Bioinform..

[43]  Ashkan Golshani,et al.  Short Co-occurring Polypeptide Regions Can Predict Global Protein Interaction Maps , 2012, Scientific Reports.

[44]  Luonan Chen,et al.  Inferring protein interactions from experimental data by association probabilistic method , 2006, Proteins.

[45]  J. Matthews,et al.  Protein-protein interactions in human disease. , 2005, Current opinion in structural biology.

[46]  Minghua Deng,et al.  Inferring Domain–Domain Interactions From Protein–Protein Interactions , 2002 .

[47]  Ozlem Keskin,et al.  PRISM: protein interactions by structural matching , 2005, Nucleic Acids Res..

[48]  Wu-chun Feng,et al.  The design, implementation, and evaluation of mpiBLAST , 2003 .

[49]  A. Valencia,et al.  Similarity of phylogenetic trees as indicator of protein-protein interaction. , 2001, Protein engineering.