A model based on minimotifs for classification of stable protein-protein complexes

Prediction of protein-protein interactions (PPIs) is an important problem in biology, since interactions play key role in most biological processes and functions in living cells. PPIs have been studied from many perspectives. Of these, an important problem is prediction of different complex types such as obligate vs. non obligate and transient vs. permanent, among others. We focus on prediction of obligate protein complexes, which are more stable and perform a specific function, as opposed to transient and non-obligate complexes which last for a short period of time. We have modeled the prediction problem using minimotifs, aka short-linear motifs, to extract information contained in the protein sequences to distinguish between obligate and non-obligate PPIs. Incorporating different classifiers such as the k-nearest neighbor (k-NN), the support vector machine (SVM) and linear dimensionality reduction (LDR) yields a very powerful scheme for prediction. On two well-known datasets, the model delivers classification accuracies as high as 99%. Analysis and cross-dataset validation show that the information contained in the training sequences is crucial for prediction and determination of stability in PPIs.

[1]  Luis Rueda,et al.  Linear dimensionality reduction by maximizing the Chernoff distance in the transformed space , 2008, Pattern Recognit..

[2]  William R. Taylor,et al.  Protein bioinformatics - an algorithmic approach to sequence and structure analysis , 2004 .

[3]  Norman E. Davey,et al.  How viruses hijack cell regulation. , 2011, Trends in biochemical sciences.

[4]  Jiangning Song,et al.  Can simple codon pair usage predict protein-protein interaction? , 2012, Molecular bioSystems.

[5]  Richard J. Edwards,et al.  SLiMSearch 2.0: biological context for short linear motifs in proteins , 2011, Nucleic Acids Res..

[6]  Philip Machanick,et al.  The value of position-specific priors in motif discovery using MEME , 2010, BMC Bioinformatics.

[7]  Sanguthevar Rajasekaran,et al.  Minimotif Miner 3.0: database expansion and significantly improved reduction of false-positive predictions from consensus sequences , 2011, Nucleic Acids Res..

[8]  Nir Ben-Tal,et al.  QuasiMotiFinder: protein annotation by searching for evolutionarily conserved motif-like patterns , 2005, Nucleic Acids Res..

[9]  Luis Rueda,et al.  A model to predict and analyze protein-protein interaction types using electrostatic energies , 2012, 2012 IEEE International Conference on Bioinformatics and Biomedicine.

[10]  Peng Chen,et al.  Predicting protein interaction sites from residue spatial sequence profile and evolution rate , 2006, FEBS Letters.

[11]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[12]  Z. Weng,et al.  Structure, function, and evolution of transient and obligate protein-protein interactions. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[13]  J. Silberg,et al.  A transposase strategy for creating libraries of circularly permuted proteins , 2012, Nucleic acids research.

[14]  David A. Gough,et al.  Predicting protein-protein interactions from primary structure , 2001, Bioinform..

[15]  The UniProt Consortium,et al.  Reorganizing the protein space at the Universal Protein Resource (UniProt) , 2011, Nucleic Acids Res..

[16]  Emmanuel D Levy,et al.  Evolution and dynamics of protein interactions and networks. , 2008, Current opinion in structural biology.

[17]  Alioune Ngom,et al.  Prediction of Biological Protein-protein Interaction Types Using Short-Linear Motifs , 2013, BCB.

[18]  Juwen Shen,et al.  Predicting protein–protein interactions based only on sequences information , 2007, Proceedings of the National Academy of Sciences.

[19]  Johannes Goll,et al.  Protein interaction data curation: the International Molecular Exchange (IMEx) consortium , 2012, Nature Methods.

[20]  José A. Reyes,et al.  Prediction of protein-protein interaction types using association rule based classification , 2009, BMC Bioinformatics.

[21]  Hongbo Zhu,et al.  NOXclass: prediction of protein-protein interaction types , 2006, BMC Bioinformatics.

[22]  Panos M. Pardalos,et al.  k-Nearest Neighbor Classification , 2009 .

[23]  W. S. Valdar,et al.  Scoring residue conservation , 2002, Proteins.

[24]  B. Rost,et al.  Predicted protein–protein interaction sites from local sequence information , 2003, FEBS letters.

[25]  Shigeo Abe Support Vector Machines for Pattern Classification , 2010, Advances in Pattern Recognition.

[26]  Ruth Nussinov,et al.  Protein dynamics and conformational selection in bidirectional signal transduction , 2012, BMC Biology.

[27]  J. Thornton,et al.  Diversity of protein–protein interactions , 2003, The EMBO journal.

[28]  Luis Rueda,et al.  Prediction of biological protein–protein interactions using atom‐type and amino acid properties , 2011, Proteomics.

[29]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[30]  Jun S. Liu,et al.  Gibbs motif sampling: Detection of bacterial outer membrane protein repeats , 1995, Protein science : a publication of the Protein Society.

[31]  Richard J. Edwards,et al.  SLiMFinder: A Probabilistic Method for Identifying Over-Represented, Convergently Evolved, Short Linear Motifs in Proteins , 2007, PloS one.

[32]  Ruben Abagyan,et al.  Predicting Molecular Interactions in Structural Proteomics , 2008 .

[33]  G J Williams,et al.  The Protein Data Bank: a computer-based archival file for macromolecular structures. , 1978, Archives of biochemistry and biophysics.

[34]  Ruben Abagyan,et al.  ICM—A new method for protein modeling and design: Applications to docking and structure prediction from the distorted native conformation , 1994, J. Comput. Chem..