BoCluSt: bootstrap clustering stability algorithm for community detection in networks

The identification of modules or communities in sets of related variables is a key step in the analysis and modeling of biological systems. Procedures for this identification are usually designed to allow fast analyses of very large datasets and may produce suboptimal results when these sets are of a small to moderate size. This article introduces BoCluSt, a new, somewhat more computationally intensive, community detection procedure that is based on combining a clustering algorithm with a measure of stability under bootstrap resampling. Both computer simulation and analyses of experimental data showed that BoCluSt can outperform current procedures in the identification of multiple modules in data sets with a moderate number of variables. In addition, the procedure provides users with a null distribution of results to evaluate the support for the existence of community structure in the data. BoCluSt takes individual measures for a set of variables as input, and may be a valuable and robust exploratory tool of network analysis, as it provides 1) an estimation of the best partition of variables into modules, 2) a measure of the support for the existence of modular structures, and 3) an overall description of the whole structure, which may reveal hierarchical modular situations, in which modules are composed of smaller sub-modules.

[1]  M. Newman,et al.  Finding community structure in networks using the eigenvectors of matrices. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[2]  G. von Dassow,et al.  Modularity in animal development and evolution: elements of a conceptual framework for EvoDevo. , 1999, The Journal of experimental zoology.

[3]  Jessica A. Bolker,et al.  Modularity in Development and Why It Matters to Evo-Devo1 , 2000 .

[4]  Andrea Lancichinetti,et al.  Community detection algorithms: a comparative analysis: invited presentation, extended abstract , 2009, VALUETOOLS.

[5]  S. Fortunato,et al.  Resolution limit in community detection , 2006, Proceedings of the National Academy of Sciences.

[6]  Martin Rosvall,et al.  Maps of Information Flow Reveal Community Structure In Complex Networks , 2007 .

[7]  Christian Borgelt,et al.  Resampling for Fuzzy Clustering , 2007, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[8]  J. Pereira-Leal,et al.  Modularity: Understanding the Development and Evolution of Natural Complex Systems , 2006 .

[9]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[10]  Matteo Pardo,et al.  A stability based validity method for fuzzy clustering , 2010, Pattern Recognit..

[11]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[12]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[13]  Gábor Csárdi,et al.  The igraph software package for complex network research , 2006 .

[14]  J. Breckenridge Replicating Cluster Analysis: Method, Consistency, and Validity. , 1989, Multivariate behavioral research.

[15]  P. Innocenti,et al.  Experimental Evidence Supports a Sex-Specific Selective Sieve in Mitochondrial Genome Evolution , 2011, Science.

[16]  A. Arenas,et al.  Community detection in complex networks using extremal optimization. , 2005, Physical review. E, Statistical, nonlinear, and soft matter physics.

[17]  Matthieu Latapy,et al.  Computing Communities in Large Networks Using Random Walks , 2004, J. Graph Algorithms Appl..

[18]  D. Koller,et al.  A module map showing conditional activity of expression modules in cancer , 2004, Nature Genetics.

[19]  Eytan Domany,et al.  Resampling Method for Unsupervised Estimation of Cluster Validity , 2001, Neural Computation.

[20]  Benjamin M. Bolstad,et al.  affy - analysis of Affymetrix GeneChip data at the probe level , 2004, Bioinform..

[21]  A. Vespignani,et al.  The architecture of complex weighted networks. , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[22]  J. Olesen,et al.  Ecological modules and roles of species in heathland plant-insect flower visitor networks. , 2009, The Journal of animal ecology.

[23]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[24]  J. Reichardt,et al.  Partitioning and modularity of graphs with arbitrary degree distribution. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[25]  Anil K. Jain Data clustering: 50 years beyond K-means , 2008, Pattern Recognit. Lett..

[26]  Roger Guimerà,et al.  Extracting the hierarchical organization of complex systems , 2007, Proceedings of the National Academy of Sciences.

[27]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[28]  Chris Mungall,et al.  AmiGO: online access to ontology and annotation data , 2008, Bioinform..

[29]  A. Barabasi,et al.  Hierarchical Organization of Modularity in Metabolic Networks , 2002, Science.

[30]  Shilpa Chakravartula,et al.  Complex Networks: Structure and Dynamics , 2014 .

[31]  G. Wagner,et al.  The road to modularity , 2007, Nature Reviews Genetics.

[32]  M. Newman,et al.  Finding community structure in very large networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[33]  M E J Newman,et al.  Modularity and community structure in networks. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[34]  Diego Rasskin-Gutman,et al.  Modularity. Understanding the Development and Evolution of Natural Complex Systems , 2005 .

[35]  Martin Rosvall,et al.  Maps of random walks on complex networks reveal community structure , 2007, Proceedings of the National Academy of Sciences.

[36]  Mehmet Gonullu,et al.  Department of Computer Science and Engineering , 2011 .

[37]  Bo-Juen Chen,et al.  Modularity and interactions in the genetics of gene expression , 2009, Proceedings of the National Academy of Sciences.

[38]  R. Guimerà,et al.  Functional cartography of complex metabolic networks , 2005, Nature.

[39]  Réka Albert,et al.  Near linear time algorithm to detect community structures in large-scale networks. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[40]  L. Mirny,et al.  Protein complexes and functional modules in molecular networks , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[41]  A. Raftery,et al.  Model‐based clustering for social networks , 2007 .

[42]  A. Meyer,et al.  Resampling-Based Approaches to Study Variation in Morphological Modularity , 2013, PloS one.

[43]  Ulrik Brandes,et al.  On Modularity Clustering , 2008, IEEE Transactions on Knowledge and Data Engineering.

[44]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[45]  Arend Hintze,et al.  Evolution of Complex Modular Biological Networks , 2007, PLoS Comput. Biol..

[46]  J. Reichardt,et al.  Statistical mechanics of community detection. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[47]  G. Wagner HOMOLOGUES, NATURAL KINDS AND THE EVOLUTION OF MODULARITY , 1996 .

[48]  Christian Peter Klingenberg,et al.  Morphometric integration and modularity in configurations of landmarks: tools for evaluating a priori hypotheses , 2009, Evolution & development.

[49]  Patrick C Phillips,et al.  Network thinking in ecology and evolution. , 2005, Trends in ecology & evolution.

[50]  Joachim M. Buhmann,et al.  Stability-Based Validation of Clustering Solutions , 2004, Neural Computation.

[51]  Anat Kreimer,et al.  The evolution of modularity in bacterial metabolic networks , 2008, Proceedings of the National Academy of Sciences.

[52]  Pavel Tomancak,et al.  linkcomm: an R package for the generation, visualization, and analysis of link communities in networks of arbitrary size and type , 2011, Bioinform..

[53]  Sune Lehmann,et al.  Link communities reveal multiscale complexity in networks , 2009, Nature.

[54]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[55]  Haijun Zhou Network landscape from a Brownian particle's perspective. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[56]  Pedro Jordano,et al.  Patterns of Mutualistic Interactions in Pollination and Seed Dispersal: Connectance, Dependence Asymmetries, and Coevolution , 1987, The American Naturalist.