Improving Protein–Protein Interaction Prediction Based on Phylogenetic Information Using a Least‐Squares Support Vector Machine

Abstract:  Predicting protein–protein interactions has become a key step of reverse‐engineering biological networks to better understand cellular functions. The experimental methods in determining protein–protein interactions are time‐consuming and costly, which has motivated vigorous development of computational approaches for predicting protein–protein interactions. A set of recently developed bioinformatics methods utilizes coevolutionary information of the interacting partners (e.g., as exhibited in the form of correlations between distance matrices, where, for each protein, a matrix stores the pairwise distances between the protein and its orthologs in a group of reference genomes). We proposed a novel method to account for the intra‐matrix correlations in improving predictive accuracy. The distance matrices for a pair of proteins are transformed and concatenated into a phylogenetic vector. A least‐squares support vector machine is trained and tested on pairs of proteins, represented as phylogenetic vectors, whose interactions are known. The intra‐matrix correlations are accounted for by introducing a weighted linear kernel, which determines the dot product of two phylogenetic vectors. The performance, measured as receiver operator characteristic (ROC) score in cross‐validation experiments, shows significant improvement of our method (ROC score 0.928) over that obtained by Pearson correlations (0.659).

[1]  Johan A. K. Suykens,et al.  Least Squares Support Vector Machine Classifiers , 1999, Neural Processing Letters.

[2]  Arun K. Ramani,et al.  Exploiting the co-evolution of interacting proteins to discover interaction specificity. , 2003, Journal of molecular biology.

[3]  D. Eisenberg,et al.  A combined algorithm for genome-wide prediction of protein function , 1999, Nature.

[4]  William R. Taylor,et al.  The rapid generation of mutation data matrices from protein sequences , 1992, Comput. Appl. Biosci..

[5]  K. Katoh,et al.  MAFFT version 5: improvement in accuracy of multiple sequence alignment , 2005, Nucleic acids research.

[6]  James R. Knight,et al.  A comprehensive analysis of protein–protein interactions in Saccharomyces cerevisiae , 2000, Nature.

[7]  Scott A. Givan,et al.  ASRP: the Arabidopsis Small RNA Project Database , 2004, Nucleic Acids Res..

[8]  F. Cohen,et al.  Co-evolution of proteins with their interaction partners. , 2000, Journal of molecular biology.

[9]  Chern-Sing Goh,et al.  Co-evolutionary analysis reveals insights into protein-protein interactions. , 2002, Journal of molecular biology.

[10]  R. Ozawa,et al.  A comprehensive two-hybrid analysis to explore the yeast protein interactome , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[11]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[12]  Dustin Boswell,et al.  Introduction to Support Vector Machines , 2002 .

[13]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[14]  Ioannis Xenarios,et al.  DIP, the Database of Interacting Proteins: a research tool for studying cellular networks of protein interactions , 2002, Nucleic Acids Res..

[15]  A. Valencia,et al.  Similarity of phylogenetic trees as indicator of protein-protein interaction. , 2001, Protein engineering.

[16]  Michael Gribskov,et al.  Use of Receiver Operating Characteristic (ROC) Analysis to Evaluate Sequence Matching , 1996, Comput. Chem..

[17]  Teresa M. Przytycka,et al.  Predicting protein-protein interaction by searching evolutionary tree automorphism space , 2005, ISMB.

[18]  D. Eisenberg,et al.  Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[19]  Susumu Goto,et al.  The KEGG resource for deciphering the genome , 2004, Nucleic Acids Res..

[20]  Yoshihiro Yamanishi,et al.  The inference of protein-protein interactions by co-evolutionary analysis is improved by excluding the information about the phylogenetic relationships , 2005, Bioinform..

[21]  Bruce Rothschild,et al.  Inferring protein interactions from phylogenetic distance matrices , 2003, Bioinform..

[22]  M. Sternberg,et al.  Assessing protein co-evolution in the context of the tree of life assists in the prediction of the interactome. , 2005, Journal of molecular biology.

[23]  Li Liao,et al.  Phylogenetic tree information aids supervised learning for predicting protein-protein interaction based on distance matrices , 2007, BMC Bioinformatics.