Going the Distance for Protein Function Prediction: A New Distance Metric for Protein Interaction Networks

In protein-protein interaction (PPI) networks, functional similarity is often inferred based on the function of directly interacting proteins, or more generally, some notion of interaction network proximity among proteins in a local neighborhood. Prior methods typically measure proximity as the shortest-path distance in the network, but this has only a limited ability to capture fine-grained neighborhood distinctions, because most proteins are close to each other, and there are many ties in proximity. We introduce diffusion state distance (DSD), a new metric based on a graph diffusion property, designed to capture finer-grained distinctions in proximity for transfer of functional annotation in PPI networks. We present a tool that, when input a PPI network, will output the DSD distances between every pair of proteins. We show that replacing the shortest-path metric by DSD improves the performance of classical function prediction methods across the board.

[1]  B. Schwikowski,et al.  A network of protein–protein interactions in yeast , 2000, Nature Biotechnology.

[2]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[3]  T. Takagi,et al.  Assessment of prediction accuracy of protein function from protein–protein interaction data , 2001, Yeast.

[4]  Gary D. Bader,et al.  An automated method for finding molecular complexes in large protein interaction networks , 2003, BMC Bioinformatics.

[5]  Ting Chen,et al.  Assessment of the reliability of protein-protein interactions and protein function prediction , 2002, Pacific Symposium on Biocomputing.

[6]  Alessandro Vespignani,et al.  Global protein function prediction from protein-protein interaction networks , 2003, Nature Biotechnology.

[7]  Nello Cristianini,et al.  Kernel-Based Data Fusion and Its Application to Protein Function Prediction in Yeast , 2003, Pacific Symposium on Biocomputing.

[8]  H. Mewes,et al.  The FunCat, a functional annotation scheme for systematic classification of proteins from whole genomes. , 2004, Nucleic acids research.

[9]  Roded Sharan,et al.  Revealing modularity and organization in the yeast molecular network by integrated analysis of highly heterogeneous genomewide data. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[10]  Igor Jurisica,et al.  Modeling interactome: scale-free or geometric? , 2004, Bioinform..

[11]  Lan V. Zhang,et al.  Evidence for dynamically organized modularity in the yeast protein–protein interaction network , 2004, Nature.

[12]  S. Kasif,et al.  Whole-genome annotation by using evidence integration in functional-linkage networks. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[13]  Ting Chen,et al.  Mapping gene ontology to proteins based on protein-protein interaction data , 2004, Bioinform..

[14]  Ignacio Marín,et al.  Iterative Cluster Analysis of Protein Interaction Data , 2005, Bioinform..

[15]  Mona Singh,et al.  Whole-proteome prediction of protein function via graph-theoretic analysis of interaction maps , 2005, ISMB.

[16]  Raya Khanin,et al.  How Scale-Free Are Biological Networks , 2006, J. Comput. Biol..

[17]  Limsoon Wong,et al.  Exploiting Indirect Neighbours and Topological Weight to Predict Protein Function from Protein-Protein Interactions , 2006, BioDM.

[18]  Caroline C. Friedel,et al.  Toward the complete interactome , 2006, Nature Biotechnology.

[19]  Jianzhi Zhang,et al.  Toward a Molecular Understanding of Pleiotropy , 2006, Genetics.

[20]  Mike Tyers,et al.  BioGRID: a general repository for interaction datasets , 2005, Nucleic Acids Res..

[21]  R. Sharan,et al.  Network-based prediction of protein function , 2007, Molecular systems biology.

[22]  R. Gentleman,et al.  Coverage and error models of protein-protein interaction data by directed graph analysis , 2007, Genome Biology.

[23]  Midori A. Harris,et al.  BIOINFORMATICS APPLICATIONS NOTE doi:10.1093/bioinformatics/btm112 Databases and ontologies OBO-Edit—an ontology editor for biologists , 2007 .

[24]  Julie C. Mitchell,et al.  An automated decision‐tree approach to predicting protein interaction hot spots , 2007, Proteins.

[25]  P. Robinson,et al.  Walking the interactome for prioritization of candidate disease genes. , 2008, American journal of human genetics.

[26]  Joel S. Bader,et al.  Precision and recall estimates for two-hybrid screens , 2008, Bioinform..

[27]  Mona Singh,et al.  How and when should interactome-derived clusters be used to predict functional modules and protein function? , 2009, Bioinform..

[28]  Mehmet Koyutürk,et al.  Vavien: An Algorithm for Prioritizing Candidate Disease Genes Based on Topological Similarity of Proteins in Interaction Networks , 2011, J. Comput. Biol..

[29]  Michael J. E. Sternberg,et al.  CombFunc: predicting protein function using heterogeneous data sources , 2012, Nucleic Acids Res..