Finding local communities in protein networks

BackgroundProtein-protein interactions (PPIs) play fundamental roles in nearly all biological processes, and provide major insights into the inner workings of cells. A vast amount of PPI data for various organisms is available from BioGRID and other sources. The identification of communities in PPI networks is of great interest because they often reveal previously unknown functional ties between proteins. A large number of global clustering algorithms have been applied to protein networks, where the entire network is partitioned into clusters. Here we take a different approach by looking for local communities in PPI networks.ResultsWe develop a tool, named Local Protein Community Finder, which quickly finds a community close to a queried protein in any network available from BioGRID or specified by the user. Our tool uses two new local clustering algorithms Nibble and PageRank-Nibble, which look for a good cluster among the most popular destinations of a short random walk from the queried vertex. The quality of a cluster is determined by proportion of outgoing edges, known as conductance, which is a relative measure particularly useful in undersampled networks. We show that the two local clustering algorithms find communities that not only form excellent clusters, but are also likely to be biologically relevant functional components. We compare the performance of Nibble and PageRank-Nibble to other popular and effective graph partitioning algorithms, and show that they find better clusters in the graph. Moreover, Nibble and PageRank-Nibble find communities that are more functionally coherent.ConclusionThe Local Protein Community Finder, accessible at http://xialab.bu.edu/resources/lpcf, allows the user to quickly find a high-quality community close to a queried protein in any network available from BioGRID or specified by the user. We show that the communities found by our tool form good clusters and are functionally coherent, making our application useful for biologists who wish to investigate functional modules that a particular protein is a part of.

[1]  Jianbo Shi,et al.  Learning Segmentation by Random Walks , 2000, NIPS.

[2]  Haiyuan Yu,et al.  Developing a similarity measure in biological function space , 2007 .

[3]  Sean R. Collins,et al.  Global landscape of protein complexes in the yeast Saccharomyces cerevisiae , 2006, Nature.

[4]  Andrew B. Kahng,et al.  New spectral methods for ratio cut partitioning and clustering , 1991, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[5]  James R. Lee,et al.  Eigenvalue Bounds, Spectral Partitioning, and Metrical Deformations via Flows , 2008, 2008 49th Annual IEEE Symposium on Foundations of Computer Science.

[6]  Stephen B. Seidman,et al.  Network structure and minimum degree , 1983 .

[7]  Yan Wang,et al.  VisANT 3.5: multi-scale network visualization, analysis and inference based on the gene ontology , 2009, Nucleic Acids Res..

[8]  Charles J. Alpert,et al.  Spectral Partitioning: The More Eigenvectors, The Better , 1995, 32nd Design Automation Conference.

[9]  Jennifer Widom,et al.  Scaling personalized web search , 2003, WWW '03.

[10]  L. Mirny,et al.  Protein complexes and functional modules in molecular networks , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[11]  J. Rodgers,et al.  Thirteen ways to look at the correlation coefficient , 1988 .

[12]  Jon M. Kleinberg,et al.  Inferring Web communities from link topology , 1998, HYPERTEXT '98.

[13]  Mark Newman,et al.  Detecting community structure in networks , 2004 .

[14]  Jonathan A. Kelner Spectral partitioning, eigenvalue bounds, and circle packings for graphs of bounded genus , 2004, STOC '04.

[15]  Shang-Hua Teng,et al.  Spectral partitioning works: planar graphs and finite element meshes , 1996, Proceedings of 37th Conference on Foundations of Computer Science.

[16]  Fan Chung Graham,et al.  Local Graph Partitioning using PageRank Vectors , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[17]  Andrei Z. Broder,et al.  Workshop on Algorithms and Models for the Web Graph , 2007, WAW.

[18]  Dániel Fogaras,et al.  Towards Scaling Fully Personalized PageRank , 2004, WAW.

[19]  Gary D. Bader,et al.  An automated method for finding molecular complexes in large protein interaction networks , 2003, BMC Bioinformatics.

[20]  A. Clauset Finding local community structure in networks. , 2005, Physical review. E, Statistical, nonlinear, and soft matter physics.

[21]  Shoshana J. Wodak,et al.  Markov clustering versus affinity propagation for the partitioning of protein interaction graphs , 2009, BMC Bioinformatics.

[22]  Mark Gerstein,et al.  Predicting interactions in protein networks by completing defective cliques , 2006, Bioinform..

[23]  George Karypis,et al.  Multilevel algorithms for partitioning power-law graphs , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[24]  YuanBo,et al.  Detecting functional modules in the yeast protein--protein interaction network , 2006 .

[25]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[26]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[27]  Igor Jurisica,et al.  Protein complex prediction via cost-based clustering , 2004, Bioinform..

[28]  Andrew B. Kahng,et al.  Spectral Partitioning with Multiple Eigenvectors , 1999, Discret. Appl. Math..

[29]  Mark Jerrum,et al.  Approximate Counting, Uniform Generation and Rapidly Mixing Markov Chains , 1987, WG.

[30]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[31]  David Martin,et al.  Functional classification of proteins for the prediction of cellular function from a protein-protein interaction network , 2003, Genome Biology.

[32]  Jingchun Chen,et al.  Detecting functional modules in the yeast protein-protein interaction network , 2006, Bioinform..

[33]  T. Vicsek,et al.  Uncovering the overlapping community structure of complex networks in nature and society , 2005, Nature.

[34]  Ravi Kumar,et al.  Trawling the Web for Emerging Cyber-Communities , 1999, Comput. Networks.

[35]  Santosh S. Vempala,et al.  On clusterings-good, bad and spectral , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[36]  C. Lee Giles,et al.  Self-Organization and Identification of Web Communities , 2002, Computer.

[37]  M E J Newman,et al.  Fast algorithm for detecting community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[38]  Jonathan A. Kelner Spectral Partitioning, Eigenvalue Bounds, and Circle Packings for Graphs of Bounded Genus , 2006, SIAM J. Comput..

[39]  D. Bu,et al.  Topological structure analysis of the protein-protein interaction network in budding yeast. , 2003, Nucleic acids research.

[40]  A. Barabasi,et al.  High-Quality Binary Protein Interaction Map of the Yeast Interactome Network , 2008, Science.

[41]  László Lovász,et al.  Approximating clique is almost NP-complete , 1991, [1991] Proceedings 32nd Annual Symposium of Foundations of Computer Science.

[42]  Mike Tyers,et al.  BioGRID: a general repository for interaction datasets , 2005, Nucleic Acids Res..

[43]  David Botstein,et al.  GO: : TermFinder--open source software for accessing Gene Ontology information and finding significantly enriched Gene Ontology terms associated with a list of genes , 2004, Bioinform..

[44]  R. Carter 11 – IT and society , 1991 .

[45]  Kara Dolinski,et al.  Gene Ontology annotations at SGD: new data sources and annotation methods , 2007, Nucleic Acids Res..

[46]  M. Vidal,et al.  Effect of sampling on topology predictions of protein-protein interaction networks , 2005, Nature Biotechnology.

[47]  Richard M. Karp,et al.  Reducibility Among Combinatorial Problems , 1972, 50 Years of Integer Programming.

[48]  Mark Gerstein,et al.  Total ancestry measure: quantifying the similarity in tree-like classification, with genomic applications , 2007, Bioinform..