Binding Site Prediction for Protein-Protein Interactions and Novel Motif Discovery using Re-occurring Polypeptide Sequences

BackgroundWhile there are many methods for predicting protein-protein interaction, very few can determine the specific site of interaction on each protein. Characterization of the specific sequence regions mediating interaction (binding sites) is crucial for an understanding of cellular pathways. Experimental methods often report false binding sites due to experimental limitations, while computational methods tend to require data which is not available at the proteome-scale. Here we present PIPE-Sites, a novel method of protein specific binding site prediction based on pairs of re-occurring polypeptide sequences, which have been previously shown to accurately predict protein-protein interactions. PIPE-Sites operates at high specificity and requires only the sequences of query proteins and a database of known binary interactions with no binding site data, making it applicable to binding site prediction at the proteome-scale.ResultsPIPE-Sites was evaluated using a dataset of 265 yeast and 423 human interacting proteins pairs with experimentally-determined binding sites. We found that PIPE-Sites predictions were closer to the confirmed binding site than those of two existing binding site prediction methods based on domain-domain interactions, when applied to the same dataset. Finally, we applied PIPE-Sites to two datasets of 2347 yeast and 14,438 human novel interacting protein pairs predicted to interact with high confidence. An analysis of the predicted interaction sites revealed a number of protein subsequences which are highly re-occurring in binding sites and which may represent novel binding motifs.ConclusionsPIPE-Sites is an accurate method for predicting protein binding sites and is applicable to the proteome-scale. Thus, PIPE-Sites could be useful for exhaustive analysis of protein binding patterns in whole proteomes as well as discovery of novel binding motifs. PIPE-Sites is available online at http://pipe-sites.cgmlab.org/.

[1]  Teresa M. Przytycka,et al.  DOMINE: a database of protein domain interactions , 2007, Nucleic Acids Res..

[2]  B. Rost,et al.  Predicted protein–protein interaction sites from local sequence information , 2003, FEBS letters.

[3]  Daryi Wang,et al.  A general tendency for conservation of protein length across eukaryotic kingdoms. , 2004, Molecular biology and evolution.

[4]  Peer Bork,et al.  SMART 6: recent updates and new developments , 2008, Nucleic Acids Res..

[5]  Xiaolong Wang,et al.  Protein-protein interaction site prediction based on conditional random fields , 2007, Bioinform..

[6]  Ting Chen,et al.  An integrated approach to the prediction of domain-domain interactions , 2006, BMC Bioinformatics.

[7]  Erli Pang,et al.  Yeast protein-protein interaction binding sites: prediction from the motif-motif, motif-domain and domain-domain levels. , 2010, Molecular bioSystems.

[8]  Michael Cherry,et al.  South African museums' status ‘at risk’ , 1997, Nature.

[9]  A. Hall,et al.  Rho GTPases and the actin cytoskeleton. , 1998, Science.

[10]  T. Pawson,et al.  Assembly of Cell Regulatory Systems Through Protein Interaction Domains , 2003, Science.

[11]  Albert Chan,et al.  PIPE: a protein-protein interaction prediction engine based on the re-occurring short polypeptide sequences between known interacting protein pairs , 2006, BMC Bioinformatics.

[12]  Kara Dolinski,et al.  The BioGRID Interaction Database: 2008 update , 2008, Nucleic Acids Res..

[13]  Sergios Theodoridis,et al.  A Novel Efficient Cluster-Based MLSE Equalizer for Satellite Communication Channels with-QAM Signaling , 2006, EURASIP J. Adv. Signal Process..

[14]  J. R. Green,et al.  Global investigation of protein–protein interactions in yeast Saccharomyces cerevisiae using re-occurring short polypeptide sequences , 2008, Nucleic acids research.

[15]  M. Campbell,et al.  PANTHER: a library of protein families and subfamilies indexed by function. , 2003, Genome research.

[16]  L. Holm,et al.  The Pfam protein families database , 2005, Nucleic Acids Res..

[17]  R. Ozawa,et al.  A comprehensive two-hybrid analysis to explore the yeast protein interactome , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[18]  A. Thomas,et al.  A fast method to predict protein interaction sites from sequences. , 2000, Journal of molecular biology.

[19]  Rolf Apweiler,et al.  InterProScan - an integration platform for the signature-recognition methods in InterPro , 2001, Bioinform..

[20]  Yungki Park,et al.  Critical assessment of sequence-based protein-protein interaction prediction methods that do not require homologous protein sequences , 2009, BMC Bioinformatics.

[21]  Christopher J. Lee,et al.  Inferring protein domain interactions from databases of interacting proteins , 2005, Genome Biology.

[22]  A. Grigoriev On the number of protein-protein interactions in the yeast proteome. , 2003, Nucleic acids research.

[23]  Horst Bunke,et al.  Distance Measures for Image Segmentation Evaluation , 2006, EURASIP J. Adv. Signal Process..

[24]  Olivier Lichtarge,et al.  BIOINFORMATICS ORIGINAL PAPER Systems biology , 2004 .

[25]  David W Ritchie,et al.  Recent progress and future directions in protein-protein docking. , 2008, Current protein & peptide science.

[26]  P. Bork,et al.  Functional organization of the yeast proteome by systematic analysis of protein complexes , 2002, Nature.

[27]  Christie S. Chang,et al.  The BioGRID interaction database: 2013 update , 2012, Nucleic Acids Res..

[28]  M. Sheng,et al.  PDZ Domains: Structural Modules for Protein Complex Assembly* , 2002, The Journal of Biological Chemistry.

[29]  Yanzhi Guo,et al.  Using support vector machine combined with auto covariance to predict protein–protein interactions from protein sequences , 2008, Nucleic acids research.

[30]  I. D. Campbell,et al.  SH3 Domains: Molecular ‘Velcro’ 1 , 1994, Current Biology.

[31]  Robert D. Finn,et al.  InterPro: the integrative protein signature database , 2008, Nucleic Acids Res..

[32]  C. Ball,et al.  Genetic and physical maps of Saccharomyces cerevisiae. , 1997, Nature.

[33]  Jean-Loup Faulon,et al.  Predicting protein-protein interactions using signature products , 2005, Bioinform..

[34]  D. Koller,et al.  InSite: a computational method for identifying protein-protein interaction binding sites on a proteome-wide scale , 2007, Genome Biology.

[35]  V. Pande,et al.  Absolute comparison of simulated and experimental protein-folding dynamics , 2002, Nature.

[36]  Ole Kristensen,et al.  A unique set of SH3–SH3 interactions controls IB1 homodimerization , 2006, The EMBO journal.

[37]  Juwen Shen,et al.  Predicting protein–protein interactions based only on sequences information , 2007, Proceedings of the National Academy of Sciences.

[38]  Xiaomei Wu,et al.  Genome-wide inference of protein interaction sites: lessons from the yeast high-quality negative protein–protein interaction dataset , 2008, Nucleic acids research.

[39]  Igor Stagljar,et al.  Analysis of membrane protein interactions using yeast-based technologies. , 2002, Trends in biochemical sciences.

[40]  M. Levitt,et al.  Conformation of amino acid side-chains in proteins. , 1978, Journal of molecular biology.

[41]  Andrew Chatr-aryamontri,et al.  DOMINO: a database of domain–peptide interactions , 2006, Nucleic Acids Res..

[42]  Sean R. Collins,et al.  Global landscape of protein complexes in the yeast Saccharomyces cerevisiae , 2006, Nature.

[43]  James R. Knight,et al.  A comprehensive analysis of protein–protein interactions in Saccharomyces cerevisiae , 2000, Nature.

[44]  J. Brune,et al.  Structural features in a brittle–ductile wax model of continental extension , 1997, nature.

[45]  K. Guimaraes,et al.  Predicting domain-domain interactions using a parsimony approach , 2006, Genome Biology.