How and when should interactome-derived clusters be used to predict functional modules and protein function?

Motivation: Clustering of protein–protein interaction networks is one of the most common approaches for predicting functional modules, protein complexes and protein functions. But, how well does clustering perform at these tasks? Results: We develop a general framework to assess how well computationally derived clusters in physical interactomes overlap functional modules derived via the Gene Ontology (GO). Using this framework, we evaluate six diverse network clustering algorithms using Saccharomyces cerevisiae and show that (i) the performances of these algorithms can differ substantially when run on the same network and (ii) their relative performances change depending upon the topological characteristics of the network under consideration. For the specific task of function prediction in S.cerevisiae, we demonstrate that, surprisingly, a simple non-clustering guilt-by-association approach outperforms widely used clustering-based approaches that annotate a protein with the overrepresented biological process and cellular component terms in its cluster; this is true over the range of clustering algorithms considered. Further analysis parameterizes performance based on the number of annotated proteins, and suggests when clustering approaches should be used for interactome functional analyses. Overall our results suggest a re-examination of when and how clustering approaches should be applied to physical interactomes, and establishes guidelines by which novel clustering approaches for biological networks should be justified and evaluated with respect to functional analysis. Contact: msingh@cs.princeton.edu Supplementary information: Supplementary data are available at Bioinformatics online.

[1]  J. F. Poyatos,et al.  How biologically relevant are interaction-based modules in protein networks? , 2004, Genome Biology.

[2]  Anton J. Enright,et al.  Detection of functional modules from protein interaction networks , 2003, Proteins.

[3]  Mike Tyers,et al.  BioGRID: a general repository for interaction datasets , 2005, Nucleic Acids Res..

[4]  Arthur Brady,et al.  Fault Tolerance in Protein Interaction Networks: Stable Bipartite Subgraphs and Redundant Pathways , 2009, PloS one.

[5]  M E J Newman,et al.  Modularity and community structure in networks. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[6]  Michael C. Schatz,et al.  Revealing Biological Modules via Graph Summarization , 2009, J. Comput. Biol..

[7]  Gary D Bader,et al.  Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry , 2002, Nature.

[8]  Francis D. Gibbons,et al.  Predicting protein complex membership using probabilistic network reliability. , 2004, Genome research.

[9]  Igor Jurisica,et al.  Protein complex prediction via cost-based clustering , 2004, Bioinform..

[10]  P. Bork,et al.  Functional organization of the yeast proteome by systematic analysis of protein complexes , 2002, Nature.

[11]  T. Ideker,et al.  Systematic interpretation of genetic interactions using protein networks , 2005, Nature Biotechnology.

[12]  T. Ito,et al.  Toward a protein-protein interaction map of the budding yeast: A comprehensive system to examine two-hybrid interactions in all possible combinations between the yeast proteins. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[13]  Mona Singh,et al.  Whole-proteome prediction of protein function via graph-theoretic analysis of interaction maps , 2005, ISMB.

[14]  Chris Ding,et al.  which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. , 2007 .

[15]  R. Sharan,et al.  Network-based prediction of protein function , 2007, Molecular systems biology.

[16]  Jacques van Helden,et al.  Evaluation of clustering algorithms for protein-protein interaction networks , 2006, BMC Bioinformatics.

[17]  A. Barabasi,et al.  Network biology: understanding the cell's functional organization , 2004, Nature Reviews Genetics.

[18]  Shigehiko Kanaya,et al.  Development and implementation of an algorithm for detection of protein complexes in large interaction networks , 2006, BMC Bioinformatics.

[19]  Douglas B. Kell,et al.  Computational cluster validation in post-genomic data analysis , 2005, Bioinform..

[20]  David Martin,et al.  Functional classification of proteins for the prediction of cellular function from a protein-protein interaction network , 2003, Genome Biology.

[21]  Frank Dudbridge,et al.  The Use of Edge-Betweenness Clustering to Investigate Biological Function in Protein Interaction Networks , 2005, BMC Bioinformatics.

[22]  Carole A. Goble,et al.  Semantic Similarity Measures as Tools for Exploring the Gene Ontology , 2002, Pacific Symposium on Biocomputing.

[23]  B. Schwikowski,et al.  A network of protein–protein interactions in yeast , 2000, Nature Biotechnology.

[24]  M. Newman,et al.  Random graphs with arbitrary degree distributions and their applications. , 2000, Physical review. E, Statistical, nonlinear, and soft matter physics.

[25]  Sean R. Collins,et al.  Global landscape of protein complexes in the yeast Saccharomyces cerevisiae , 2006, Nature.

[26]  James R. Knight,et al.  A comprehensive analysis of protein–protein interactions in Saccharomyces cerevisiae , 2000, Nature.

[27]  P. Bork,et al.  Proteome survey reveals modularity of the yeast cell machinery , 2006, Nature.

[28]  Jingchun Chen,et al.  Detecting functional modules in the yeast protein-protein interaction network , 2006, Bioinform..

[29]  M. Samanta,et al.  Predicting protein functions from redundancies in large-scale protein interaction networks , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[30]  Illés J. Farkas,et al.  CFinder: locating cliques and overlapping modules in biological networks , 2006, Bioinform..

[31]  Dmitrij Frishman,et al.  MIPS: analysis and annotation of proteins from whole genomes in 2005 , 2005, Nucleic Acids Res..

[32]  Ignacio Marín,et al.  Iterative Cluster Analysis of Protein Interaction Data , 2005, Bioinform..

[33]  Susmita Datta,et al.  Methods for evaluating clustering algorithms for gene expression data using a reference set of functional classes , 2006, BMC Bioinformatics.

[34]  Alexander Rives,et al.  Modular organization of cellular networks , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[35]  R. Karp,et al.  From the Cover : Conserved patterns of protein interaction in multiple species , 2005 .

[36]  R. Ozawa,et al.  A comprehensive two-hybrid analysis to explore the yeast protein interactome , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[37]  Michael Lappe,et al.  From gene networks to gene function. , 2003, Genome research.

[38]  J. Hopfield,et al.  From molecular to modular cell biology , 1999, Nature.

[39]  ArnauVicente,et al.  Iterative Cluster Analysis of Protein Interaction Data , 2005 .

[40]  L. Mirny,et al.  Protein complexes and functional modules in molecular networks , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[41]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[42]  HandlJulia,et al.  Computational cluster validation in post-genomic data analysis , 2005 .

[43]  Kui Zhang,et al.  Prediction of protein function using protein-protein interaction data , 2002, Proceedings. IEEE Computer Society Bioinformatics Conference.

[44]  H. Bussey,et al.  Exploring genetic interactions and networks with yeast , 2007, Nature Reviews Genetics.

[45]  P. Bork,et al.  Genome evolution reveals biochemical networks and functional modules , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[46]  Gary D. Bader,et al.  An automated method for finding molecular complexes in large protein interaction networks , 2003, BMC Bioinformatics.

[47]  Claudio Castellano,et al.  Defining and identifying communities in networks. , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[48]  Anton J. Enright,et al.  An efficient algorithm for large-scale detection of protein families. , 2002, Nucleic acids research.

[49]  Feng Luo,et al.  Modular organization of protein interaction networks , 2007, Bioinform..