Multistep greedy algorithm identifies community structure in real-world and computer-generated networks

We have recently introduced a multistep extension of the greedy algorithm for modularity optimization. The extension is based on the idea that merging l pairs of communities (l>1) at each iteration prevents premature condensation into few large communities. Here, an empirical formula is presented for the choice of the step width l that generates partitions with (close to) optimal modularity for 17 real-world and 1100 computer-generated networks. Furthermore, an in-depth analysis of the communities of two real-world networks (the metabolic network of the bacterium E. coli and the graph of coappearing words in the titles of papers coauthored by Martin Karplus) provides evidence that the partition obtained by the multistep greedy algorithm is superior to the one generated by the original greedy algorithm not only with respect to modularity, but also according to objective criteria. In other words, the multistep extension of the greedy algorithm reduces the danger of getting trapped in local optima of modularity and generates more reasonable partitions.

[1]  F. Rao,et al.  Local modularity measure for network clusterizations. , 2005, Physical review. E, Statistical, nonlinear, and soft matter physics.

[2]  Susumu Goto,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 2000, Nucleic Acids Res..

[3]  F. Rao,et al.  The protein folding network. , 2004, Journal of molecular biology.

[4]  Eric J. Deeds,et al.  High-resolution protein folding with a transferable potential. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[5]  M. Newman,et al.  Finding community structure in very large networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[6]  Remo Guidieri Res , 1995, RES: Anthropology and Aesthetics.

[7]  Amedeo Caflisch,et al.  Efficient modularity optimization by multistep greedy algorithm and vertex mover refinement. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[8]  A. Arenas,et al.  Community detection in complex networks using extremal optimization. , 2005, Physical review. E, Statistical, nonlinear, and soft matter physics.

[9]  Philip Ball,et al.  Achievement index climbs the ranks , 2007, Nature.

[10]  A Díaz-Guilera,et al.  Self-similar community structure in a network of human interactions. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[11]  S. Fortunato,et al.  Resolution limit in community detection , 2006, Proceedings of the National Academy of Sciences.

[12]  M E J Newman,et al.  Fast algorithm for detecting community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[13]  Leon Danon,et al.  Comparing community structure identification , 2005, cond-mat/0505245.

[14]  Amedeo Caflisch,et al.  One-dimensional barrier-preserving free-energy projections of a beta-sheet miniprotein: new insights into the folding process. , 2008, The journal of physical chemistry. B.

[15]  An-Ping Zeng,et al.  Reconstruction of metabolic networks from genome data and analysis of their global structure for various organisms , 2003, Bioinform..

[16]  Thomas A. Schreiber,et al.  The University of South Florida free association, rhyme, and word fragment norms , 2004, Behavior research methods, instruments, & computers : a journal of the Psychonomic Society, Inc.

[17]  R. Guimerà,et al.  Functional cartography of complex metabolic networks , 2005, Nature.

[18]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[19]  R. Rosenfeld Nature , 2009, Otolaryngology--head and neck surgery : official journal of American Academy of Otolaryngology-Head and Neck Surgery.

[20]  Alessandro Flammini,et al.  Characterization and modeling of protein–protein interaction networks , 2005 .

[21]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[22]  Sean R. Collins,et al.  Global landscape of protein complexes in the yeast Saccharomyces cerevisiae , 2006, Nature.

[23]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[24]  P. Rios,et al.  Complex network analysis of free-energy landscapes , 2007, Proceedings of the National Academy of Sciences.

[25]  김삼묘,et al.  “Bioinformatics” 특집을 내면서 , 2000 .

[26]  W. Zachary,et al.  An Information Flow Model for Conflict and Fission in Small Groups , 1977, Journal of Anthropological Research.

[27]  Albert-László Barabási,et al.  Internet: Diameter of the World-Wide Web , 1999, Nature.

[28]  Susumu Goto,et al.  The KEGG databases at GenomeNet , 2002, Nucleic Acids Res..

[29]  A. Caflisch,et al.  Kinetic analysis of molecular dynamics simulations reveals changes in the denatured state and switch of folding pathways upon single‐point mutation of a β‐sheet miniprotein , 2008, Proteins.

[30]  A. Arenas,et al.  Models of social networks based on social distance attachment. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[31]  J. E. Hirsch,et al.  An index to quantify an individual's scientific research output , 2005, Proc. Natl. Acad. Sci. USA.

[32]  宁北芳,et al.  疟原虫var基因转换速率变化导致抗原变异[英]/Paul H, Robert P, Christodoulou Z, et al//Proc Natl Acad Sci U S A , 2005 .

[33]  Pablo M. Gleiser,et al.  Community Structure in Jazz , 2003, Adv. Complex Syst..

[34]  R. Albert,et al.  The large-scale organization of metabolic networks , 2000, Nature.

[35]  Amedeo Caflisch,et al.  Multistep greedy algorithm identifies community structure in real-world and computer-generated networks , 2008, Physical review. E, Statistical, nonlinear, and soft matter physics.