Protein-Protein Interface Prediction based on a Novel SVM Speedup

Protein-protein interactions play a crucial role in many cellular processes. Prediction of amino acid residues that appear in interaction sites helps decipher protein functions. Since a significant number of complexes have large enough interfaces, we hypothesize that the complex formation follows the induced-fit mechanism rather than the lock-and-key mechanism. Therefore, one should be able to characterize interface regions by frequent appearances of unstructured or flexible amino acid residues in those regions. For this residue prediction problem, we designed a novel method called “tree decomposition support vector machine” (TDSVM) that can handle large samples. Previously, the sizes of protein chains used as training data were generally in the scope of hundreds, whereas TDSVM extends the number to thousands (4,064 in our case), which yields more than a million samples, represented as feature vectors. Using TDSVM to speed up the training of kernel-based support vector machines (SVMs), at a factor of nearly 300, we were able to perform numerous experiments efficiently to optimize the parameters and feature selection that would otherwise take months. As a result, we achieved prediction outcomes with substantially high scores in F1-measure and Matthews correlation coefficient (MCC) using only protein-sequence information.

[1]  Z. Weng,et al.  Protein–protein docking benchmark 2.0: An update , 2005, Proteins.

[2]  P. Bourne,et al.  Exploiting sequence and structure homologs to identify protein–protein binding sites , 2005, Proteins.

[3]  Xiaolong Wang,et al.  Protein-protein interaction site prediction based on conditional random fields , 2007, Bioinform..

[4]  A. Valencia,et al.  Prediction of protein--protein interaction sites in heterocomplexes with neural networks. , 2002, European journal of biochemistry.

[5]  B. Wang,et al.  Inferring protein-protein interacting sites using residue conservation and evolutionary information. , 2006, Protein and peptide letters.

[6]  Zoran Obradovic,et al.  Length-dependent prediction of protein intrinsic disorder , 2006, BMC Bioinformatics.

[7]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[8]  R. Bahadur,et al.  The interface of protein-protein complexes: Analysis of contacts and prediction of interactions , 2008, Cellular and Molecular Life Sciences.

[9]  Colin Kleanthous,et al.  Protein-protein recognition , 2000 .

[10]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[11]  Michail Yu. Lobanov,et al.  Intrinsic Disorder in Protein Interactions: Insights From a Comprehensive Structural Analysis , 2009, PLoS Comput. Biol..

[12]  E. Fischer Einfluss der Configuration auf die Wirkung der Enzyme , 1894 .

[13]  Xiaolong Wang,et al.  Exploiting residue-level and profile-level interface propensities for usage in binding sites prediction of proteins , 2007, BMC Bioinformatics.

[14]  Xue-wen Chen,et al.  Sequence-based prediction of protein interaction sites with an integrative method , 2009, Bioinform..

[15]  Huan-Xiang Zhou,et al.  Prediction of interface residues in protein–protein complexes by a consensus neural network method: Test against NMR data , 2005, Proteins.

[16]  P. Radivojac,et al.  PROTEINS: Structure, Function, and Bioinformatics Suppl 7:176–182 (2005) Exploiting Heterogeneous Sequence Properties Improves Prediction of Protein Disorder , 2022 .

[17]  Song Liu,et al.  Protein binding site prediction using an empirical scoring function , 2006, Nucleic acids research.

[18]  Gérard Dreyfus,et al.  Single-layer learning revisited: a stepwise procedure for building and training a neural network , 1989, NATO Neurocomputing.

[19]  Emil Alexov,et al.  Nucleic Acids Research Advance Access published October 28, 2006 PROTCOM: searchable database of protein complexes enhanced with domain–domain structures , 2006 .

[20]  A. Bonvin,et al.  WHISCY: What information does surface conservation yield? Application to data‐driven docking , 2006, Proteins.

[21]  D. Koshland Application of a Theory of Enzyme Specificity to Protein Synthesis. , 1958, Proceedings of the National Academy of Sciences of the United States of America.

[22]  Minoru Kanehisa,et al.  AAindex: amino acid index database, progress report 2008 , 2007, Nucleic Acids Res..