Revealing Biological Modules via Graph Summarization

The division of a protein interaction network into biologically meaningful modules can aid with automated detection of protein complexes and prediction of biological processes and can uncover the global organization of the cell. We propose the use of a graph summarization (GS) technique, based on graph compression, to cluster protein interaction graphs into biologically relevant modules. The method is motivated by defining a biological module as a set of proteins that have similar sets of interaction partners. We show this definition, put into practice by a GS algorithm, reveals modules that are more biologically enriched than those found by other methods. We also apply GS to predict complex memberships, biological processes, and co-complexed pairs and show that in most settings GS is preferable over existing methods of protein interaction graph clustering.

[1]  Martin Rosvall,et al.  An information-theoretic framework for resolving community structure in complex networks , 2007, Proceedings of the National Academy of Sciences.

[2]  M. Newman,et al.  Finding community structure in very large networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[3]  Igor Jurisica,et al.  Protein complex prediction via cost-based clustering , 2004, Bioinform..

[4]  M. Gerstein,et al.  Getting connected: analysis and principles of biological networks. , 2007, Genes & development.

[5]  J. Rissanen,et al.  Modeling By Shortest Data Description* , 1978, Autom..

[6]  Michael Cherry,et al.  South African museums' status ‘at risk’ , 1997, Nature.

[7]  M. Samanta,et al.  Predicting protein functions from redundancies in large-scale protein interaction networks , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[8]  Peng Jiang,et al.  SPICi: a fast clustering algorithm for large biological networks , 2010, Bioinform..

[9]  Benjamin A. Shoemaker,et al.  Deciphering Protein–Protein Interactions. Part I. Experimental Techniques and Databases , 2007, PLoS Comput. Biol..

[10]  Gary D. Bader,et al.  An automated method for finding molecular complexes in large protein interaction networks , 2003, BMC Bioinformatics.

[11]  Claudio Castellano,et al.  Defining and identifying communities in networks. , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[12]  S. Fields,et al.  A novel genetic system to detect protein–protein interactions , 1989, Nature.

[13]  P. Shannon,et al.  Cytoscape: a software environment for integrated models of biomolecular interaction networks. , 2003, Genome research.

[14]  E. Ziv,et al.  Information-theoretic approach to network modularity. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[15]  Anton J. Enright,et al.  An efficient algorithm for large-scale detection of protein families. , 2002, Nucleic acids research.

[16]  Ravi Kumar,et al.  Trawling the Web for Emerging Cyber-Communities , 1999, Comput. Networks.

[17]  A. Owen,et al.  A Bayesian framework for combining heterogeneous data sources for gene function prediction (in Saccharomyces cerevisiae) , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[18]  Chris Ding,et al.  which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. , 2007 .

[19]  R. Sharan,et al.  Network-based prediction of protein function , 2007, Molecular systems biology.

[20]  M E J Newman,et al.  Modularity and community structure in networks. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[21]  C. Ball,et al.  Genetic and physical maps of Saccharomyces cerevisiae. , 1997, Nature.

[22]  Mona Singh,et al.  How and when should interactome-derived clusters be used to predict functional modules and protein function? , 2009, Bioinform..

[23]  Shoshana J. Wodak,et al.  CYGD: the Comprehensive Yeast Genome Database , 2004, Nucleic Acids Res..

[24]  Mark Gerstein,et al.  Predicting interactions in protein networks by completing defective cliques , 2006, Bioinform..

[25]  Lan V. Zhang,et al.  Evidence for dynamically organized modularity in the yeast protein–protein interaction network , 2004, Nature.

[26]  D. Bu,et al.  Topological structure analysis of the protein-protein interaction network in budding yeast. , 2003, Nucleic acids research.

[27]  David Martin,et al.  Functional classification of proteins for the prediction of cellular function from a protein-protein interaction network , 2003, Genome Biology.

[28]  J. Hopfield,et al.  From molecular to modular cell biology , 1999, Nature.

[29]  Matthew A. Hibbs,et al.  Finding function: evaluation methods for functional genomic data , 2006, BMC Genomics.

[30]  L. Mirny,et al.  Protein complexes and functional modules in molecular networks , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[31]  S. Dongen A new cluster algorithm for graphs , 1998 .

[32]  S. Fortunato,et al.  Resolution limit in community detection , 2006, Proceedings of the National Academy of Sciences.

[33]  Carl Kingsford,et al.  The power of protein interaction networks for associating genes with diseases , 2010, Bioinform..

[34]  Roded Sharan,et al.  Revealing modularity and organization in the yeast molecular network by integrated analysis of highly heterogeneous genomewide data. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[35]  William Stafford Noble,et al.  Predicting Co-Complexed Protein Pairs from Heterogeneous Data , 2008, PLoS Comput. Biol..

[36]  Igor Jurisica,et al.  Functional topology in a network of protein interactions , 2004, Bioinform..

[37]  Alexander Rives,et al.  Modular organization of cellular networks , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[38]  Jacques van Helden,et al.  Evaluation of clustering algorithms for protein-protein interaction networks , 2006, BMC Bioinformatics.

[39]  M. Gerstein,et al.  A Bayesian Networks Approach for Predicting Protein-Protein Interactions from Genomic Data , 2003, Science.

[40]  R. Guimerà,et al.  Functional cartography of complex metabolic networks , 2005, Nature.

[41]  Y. Zhang,et al.  IntAct—open source resource for molecular interaction data , 2006, Nucleic Acids Res..

[42]  Nisheeth Shrivastava,et al.  Graph summarization with bounded error , 2008, SIGMOD Conference.

[43]  Ken Wakita,et al.  Finding community structure in mega-scale social networks: [extended abstract] , 2007, WWW '07.

[44]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[45]  Mihai Pop,et al.  Finding Biologically Accurate Clusterings in Hierarchical Tree Decompositions Using the Variation of Information , 2009, J. Comput. Biol..

[46]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[47]  Sean R. Collins,et al.  Global landscape of protein complexes in the yeast Saccharomyces cerevisiae , 2006, Nature.

[48]  P. Bork,et al.  Proteome survey reveals modularity of the yeast cell machinery , 2006, Nature.

[49]  Arun K. Ramani,et al.  How complete are current yeast and human protein-interaction networks? , 2006, Genome Biology.

[50]  Ron Shamir,et al.  Identification of functional modules using network topology and high-throughput data , 2007, BMC Systems Biology.